Integrating WPE: URI Scheme Handlers and Script Messages

a lot

Most Web content is designed entirely for screen display—and there is a
lot of it—so it will spend its life in the somewhat restricted sandbox
implemented by a web browser. But rich user interfaces using Web technologies
in all kinds of consumer devices require some degree of integration, an
escape hatch to interact with the rest of their software and hardware. This is
where a Web engine like WPE designed to be embeddable shines: not only does
WPE provide a stable API, it is also comprehensive in
supporting a number of ways to integrate with its environment further than
the plethora of available Web platform APIs.
Integrating a “Web view” (the main entry point of the WPE embedding
API) involves providing extension points, which allow the
Web content (HTML/CSS/JavaScript) it loads to call into native code provided
by the client application (typically written in C/C++) from JavaScript, and
vice versa. There are a number of ways in which this can be achieved:

URI scheme handlers allow native code to
register a custom URI
scheme, which will run a user provided
function to produce content that can be “fetched” regularly.
User script messaging can be used to send JSON
messages from JavaScript running in the same context as Web pages to an user
function, and vice versa.
The JavaScriptCore API is a powerful solution to provide new JavaScript
functionality to Web content seamlessly, almost as if they were implemented
inside the Web engine itself—akin to NodeJS C++ addons.

In this post we will explore the first two, as they can support many
interesting use cases without introducing the additional complexity of
extending the JavaScript virtual machine. Let’s dive in!
Intermission
We will be referring to the code of a tiny browser written for the occasion.
Telling WebKit how to call our native code involves creating a
WebKitUserContentManager, customizing it, and then
associating it with web views during their creation. The only exception to
this are URI scheme handlers, which are registered
using webkit_web_context_register_uri_scheme(). This
minimal browser includes an on_create_view function, which is the perfect
place to do the configuration:
static WebKitWebView*
on_create_view(CogShell *shell, CogPlatform *platform)
{
    g_autoptr(GError) error = NULL;
    WebKitWebViewBackend *view_backend = cog_platform_get_view_backend(platform, NULL, &error);
    if (!view_backend)
        g_error("Cannot obtain view backend: %s", error->message);

    g_autoptr(WebKitUserContentManager) content_manager = create_content_manager();  /** NEW! **/
    configure_web_context(cog_shell_get_web_context(shell));                         /** NEW! **/

    g_autoptr(WebKitWebView) web_view =
        g_object_new(WEBKIT_TYPE_WEB_VIEW,
                     "user-content-manager", content_manager,  /** NEW! **/
                     "settings", cog_shell_get_web_settings(shell),
                     "web-context", cog_shell_get_web_context(shell),
                     "backend", view_backend,
                     NULL);
    cog_platform_init_web_view(platform, web_view);
    webkit_web_view_load_uri(web_view, s_starturl);
    return g_steal_pointer(&web_view);
}

  What is g_autoptr?
    Does it relate to g_steal_pointer?
    This does not look like C!

In the shown code examples, g_autoptr(T) is a preprocessor macro provided by
GLib that declares a pointer variable of the T type, and arranges for
freeing resources automatically when the variable goes out of scope. For
objects this results in
g_object_unref()
being called.
Internally the macro takes advantage of the __attribute__((cleanup, ...))
compiler extension, which is supported by GCC and Clang. GLib also includes a
convenience
macro that
can be used to  define cleanups for your own types.
As for g_steal_pointer, it is useful to indicate that the ownership of a
pointer declared with g_autoptr is transferred outside from the current
scope. The function returns the same pointer passed as parameter and
resets it to NULL, thus preventing cleanup functions from running.

The size has been kept small thanks to reusing code from the Cog
core library. As a bonus, it should
run on Wayland, X11, and even on a bare display using the DRM/KMS
subsystem directly. Compiling and running it, assuming you already have the
dependencies installed, should be as easy as running:
cc -o minicog minicog.c $(pkg-config cogcore --libs --cflags)
./minicog wpewebkit.org
If the current session kind is not automatically detected, a second parameter
can be used to manually choose among wl (Wayland), x11, drm, and so on:
./minicog wpewebkit.org x11
The full, unmodified source for this minimal browser is included right below.

  Complete minicog.c source
    (Gist)

/*
 * SPDX-License-Identifier: MIT
 *
 * cc -o minicog minicog.c $(pkg-config wpe-webkit-1.1 cogcore --cflags --libs)
 */

#include <cog/cog.h>

static const char *s_starturl = NULL;

static WebKitWebView*
on_create_view(CogShell *shell, CogPlatform *platform)
{
    g_autoptr(GError) error = NULL;
    WebKitWebViewBackend *view_backend = cog_platform_get_view_backend(platform, NULL, &error);
    if (!view_backend)
        g_error("Cannot obtain view backend: %s", error->message);

    g_autoptr(WebKitWebView) web_view =
        g_object_new(WEBKIT_TYPE_WEB_VIEW,
                     "settings", cog_shell_get_web_settings(shell),
                     "web-context", cog_shell_get_web_context(shell),
                     "backend", view_backend,
                     NULL);
    cog_platform_init_web_view(platform, web_view);
    webkit_web_view_load_uri(web_view, s_starturl);
    return g_steal_pointer(&web_view);
}

int
main(int argc, char *argv[])
{
    g_set_application_name("minicog");

    if (argc != 2 && argc != 3) {
        g_printerr("Usage: %s [URL [platform]]\n", argv[0]);
        return EXIT_FAILURE;
    }

    g_autoptr(GError) error = NULL;
    if (!(s_starturl = cog_uri_guess_from_user_input(argv[1], TRUE, &error)))
        g_error("Invalid URL '%s': %s", argv[1], error->message);

    cog_modules_add_directory(COG_MODULEDIR);

    g_autoptr(GApplication) app = g_application_new(NULL, G_APPLICATION_DEFAULT_FLAGS);
    g_autoptr(CogShell) shell = cog_shell_new("minicog", FALSE);
    g_autoptr(CogPlatform) platform =
        cog_platform_new((argc == 3) ? argv[2] : g_getenv("COG_PLATFORM"), &error);
    if (!platform)
        g_error("Cannot create platform: %s", error->message);

    if (!cog_platform_setup(platform, shell, "", &error))
        g_error("Cannot setup platform: %s\n", error->message);

    g_signal_connect(shell, "create-view", G_CALLBACK(on_create_view), platform);
    g_signal_connect_swapped(app, "shutdown", G_CALLBACK(cog_shell_shutdown), shell);
    g_signal_connect_swapped(app, "startup", G_CALLBACK(cog_shell_startup), shell);
    g_signal_connect(app, "activate", G_CALLBACK(g_application_hold), NULL);

    return g_application_run(app, 1, argv);
}

URI Scheme Handlers

  URI syntax (CC BY-SA 4.0,
    source),
    notice the “scheme” component at the top left.

A URI scheme handler allows “teaching” the web engine how to handle any
load (pages, subresources, the Fetch API,
XmlHttpRequest, …)—if you ever wondered how Firefox implements
about:config or how Chromium does chrome://flags, this is it. Also,
WPE WebKit has public API for this. Roughly:

A custom URI scheme is registered using
webkit_web_context_register_uri_scheme(). This also associates a callback function to it.
When WebKit detects a load for the scheme, it invokes the provided
function, passing a
WebKitURISchemeRequest.
The function generates data to be returned as the result of the load,
as a GInputStream
and calls webkit_uri_scheme_request_finish(). This sends the stream to WebKit as the
response, indicating the length of the response (if known), and the
MIME content type of the data in the stream.
WebKit will now read the data from the input stream.

Echoes
Let’s add an echo handler to our minimal browser that
replies back with the requested URI. Registering the scheme is
straightforward enough:
static void
configure_web_context(WebKitWebContext *context)
{
    webkit_web_context_register_uri_scheme(context,
                                           "echo",
                                           handle_echo_request,
                                           NULL /* userdata */,
                                           NULL /* destroy_notify */);
}

  What are “user data” and “destroy notify”?

The userdata parameter above is a convention used in many C libraries, and
specially in these based on GLib when there are callback functions involved.
It allows the user to supply a pointer to arbitrary data, which will be
passed later on as a parameter to the callback (handle_echo_request in the
example) when it gets invoked later on.
As for the destroy_notify parameter, it allows passing a function with the
signature void func(void*) (type
GDestroyNotify) which
is invoked with userdata as the argument once the user data is no longer
needed. In the example above, this callback function would be invoked when the
URI scheme is unregistered. Or, from a different perspective, this callback is
used to notify that the user data can now be destroyed.

One way of implementing handle_echo_request() could be wrapping the request
URI, which is part of the WebKitURISchemeRequest parameter to the handler,
stash it into a GBytes
container, and create an input stream to read back its
contents:
static void
handle_echo_request(WebKitURISchemeRequest *request, void *userdata)
{
    const char *request_uri = webkit_uri_scheme_request_get_uri(request);
    g_print("Request URI: %s\n", request_uri);

    g_autoptr(GBytes) data = g_bytes_new(request_uri, strlen(request_uri));
    g_autoptr(GInputStream) stream = g_memory_input_stream_new_from_bytes(data);

    webkit_uri_scheme_request_finish(request, stream, g_bytes_get_size(data), "text/plain");
}
Note how we need to tell WebKit how to finish the load
request,
in this case only with the data stream, but it is possible to have more
control of the
response
or return an
error.
With these changes, it is now possible to make page loads from the new custom
URI scheme:

  It worked!

Et Tu, CORS?
The main roadblock one may find when using custom URI schemes is that loads
are affected by CORS
checks. Not only that, WebKit by default does not allow sending cross-origin
requests to custom URI schemes. This is by design: instead of accidentally
leaking potentially sensitive data to websites, developers embedding a web
view need to consciously opt-in to allow CORS requests and
send back suitable Access-Control-Allow-* response headers.
In practice, the additional setup involves
retrieving
the WebKitSecurityManager being used by the WebKitWebContext and
registering the scheme as
CORS-enabled.
Then, in the handler function for the custom URI scheme, create a
WebKitURISchemeResponse,
which allows fine-grained control of the response, including setting
headers,
and finishing the request instead with
webkit_uri_scheme_request_finish_with_response().
Note that WebKit cuts some corners when using CORS with custom URI schemes:
handlers will not receive preflight OPTIONS requests. Instead, the CORS
headers from the replies are inspected, and if access needs to be denied
then the data stream with the response contents is discarded.
In addition to providing a complete CORS-enabled custom URI scheme example,
we recommend the Will It CORS? tool
to help troubleshoot issues.
Further Ideas
Once we have WPE WebKit calling into our custom code, there are no limits
to what a URI scheme handler can do—as long as it involves replying
to requests. Here are some ideas:

Allow pages to access a subset of paths from the local file system in a
controlled way (as CORS applies). For inspiration,
see CogDirectoryFilesHandler.
Package all your web application assets into a single ZIP file, making
loads from app:/... fetch content from it. Or, make the scheme handler
load data using GResource and bundle the application
inside your program.
Use the presence of a well-known custom URI to have a web application
realize that it is running on a certain device, and make its user
interface adapt accordingly.
Provide a REST API, which internally calls into
NetworkManager to list and configure
wireless network adapters. Combine it with a local web application and
embedded devices can now easily get on the network.

User Script Messages
While URI scheme handlers
allow streaming large chunks of data back into the Web engine, for exchanging
smaller pieces of information in a more programmatic fashion it may be
preferable to exchange messages without the need to trigger resource loads.
The user script messages part of the
WebKitUserContentManager API can be used this way:

Register a user message handler with
webkit_user_content_manager_register_script_message_handler().
As opposed to URI scheme handlers, this only enables receiving messages,
but does not associate a callback function yet.
Associate a callback to the
script-message-received
signal. The signal detail should be the name of the registered handler.
Now, whenever JavaScript code calls
window.webkit.messageHandlers.<name>.postMessage(), the signal is
emitted, and the native callback functions invoked.

  Haven't I seen postMessage() elsewhere?

Yes,
you
have.
The name is the same because it provides a similar functionality (send a
message), it guarantees little (the receiver should validate messages), and
there are similar
restrictions
in the kind of values that can be passed along.

It’s All JavaScript
Let’s add a feature to our minimal browser that will allow
JavaScript code to trigger rebooting or powering off the device where it is
running. While this should definitely not be functionality exposed to the
open Web, it is perfectly acceptable in an embedded device where we control
what gets loaded with WPE, and that exclusively uses a web application as its
user interface.

  Yet most of the code shown in this post is C.

First, create a WebKitUserContentManager, register the message handler,
and connect a callback to its associated signal:
static WebKitUserContentManager*
create_content_manager(void)
{
    g_autoptr(WebKitUserContentManager) content_manager = webkit_user_content_manager_new();
    webkit_user_content_manager_register_script_message_handler(content_manager, "powerControl");
    g_signal_connect(content_manager, "script-message-received::powerControl",
                     G_CALLBACK(handle_power_control_message), NULL);
    return g_steal_pointer(&content_manager);
}
The callback receives a WebKitJavascriptResult, from which we
can get the JSCValue with the contents of the parameter
passed to the postMessage() function. The JSCValue can now be inspected
to check for malformed messages and determine the action to take, and
then arrange to call reboot():
static void
handle_power_control_message(WebKitUserContentManager *content_manager,
                             WebKitJavascriptResult *js_result, void *userdata)
{
    JSCValue *value = webkit_javascript_result_get_js_value(js_result);
    if (!jsc_value_is_string(value)) {
        g_warning("Invalid powerControl message: argument is not a string");
        return;
    }

    g_autofree char *value_as_string = jsc_value_to_string(value);
    int action;
    if (strcmp(value_as_string, "poweroff") == 0) {
        action = RB_POWER_OFF;
    } else if (strcmp(value_as_string, "reboot") == 0) {
        action = RB_AUTOBOOT;
    } else {
        g_warning("Invalid powerControl message: '%s'", value_as_string);
        return;
    }

    g_message("Device will %s now!", value_as_string);
    sync(); reboot(action);
}
Note that the reboot() system call above will most likely fail because it
needs administrative privileges. While the browser process could run as root
to sidestep this issue—definitely not recommended!—it would be
better to grant the CAP_SYS_BOOT capability to the process, and much
better to ask the system manager daemon to handle the job. In machines
using systemd a good option is to call the .Halt()
and .Reboot() methods of its org.freedesktop.systemd1 interface.
Now we can write a small HTML document with some JavaScript sprinkled on top
to arrange sending the messages:
<html>
  <head>
    <meta charset="utf-8" />
    <title>Device Power Control</title>
  </head>
  <body>
    <button id="reboot">Reboot</button>
    <button id="poweroff">Power Off</button>
    <script type="text/javascript">
      function addHandler(name) {
        document.getElementById(name).addEventListener("click", (event) => {
          window.webkit.messageHandlers.powerControl.postMessage(name);
          return false;
        });
      }
      addHandler("reboot");
      addHandler("poweroff");
    </script>
  </body>
</html>
The complete source code for this example can be found
in this Gist.
Going In The Other Direction
But how can one return values from user messages back to the JavaScript code
running in the context of the web page? Until recently, the only option
available was exposing some known function in the page’s JavaScript code, and
then using
webkit_web_view_run_javascript()
to call it from native code later on. To make this more idiomatic and allow
waiting on a Promise, an approach like the following works:

Have convenience JavaScript functions wrapping the calls to
.postMessage() which add an unique identifier as part of the message,
create a Promise, and store it in a Map indexed by the identifier.
The Promise is itself returned from the functions.
When the callback in native code handle messages, they need to take
note of the message identifier, and then use
webkit_web_view_run_javascript() to pass it back, along with the
information needed to resolve the promise.
The Javascript code running in the page takes the Promise from
the Map that corresponds to the identifier, and resolves it.

To make this approach a bit more palatable, we could tell WebKit to inject a
script
along with the regular content, which would provide the helper
functions
needed to achieve this.
Nevertheless, the approach outlined above is cumbersome and can be
tricky to get right, not to mention that the effort needs to be duplicated in
each application. Therefore, we have recently added new API hooks to provide this
as a built-in feature, so starting in WPE WebKit 2.40 the recommended approach
involves using
webkit_user_content_manager_register_script_message_handler_with_reply()
to register handlers instead. This way, calling .postMessage() now returns a
Promise to the JavaScript code, and the callbacks connected to the
script-message-with-reply-received
signal now receive a
WebKitScriptMessageReply,
which can be used to resolve the promise—either on the spot, or
asynchronously later on.
Even More Ideas
User script messages are a powerful and rather flexible facility to make WPE
integrate web content into a complete system. The provided example is rather
simple, but as long as we do not need to pass huge amounts of data in
messages the possibilities are almost endless—especially with the
added convenience in WPE WebKit 2.40. Here are more ideas that can
be built on top of user script messages:

A handler could receive requests to “monitor” some object, and
return a Promise that gets resolved when it has received changes.
For example, this could make the user interface of a smart thermostat
react to temperate updates from a sensor.
A generic handler that takes the message payload and converts it into
D-Bus method calls, allowing
web pages to control many aspects of a typical Linux system.

Wrapping Up
WPE has been designed from the ground up to integrate with the rest of the
system, instead of having a sole focus on rendering Web content inside a
monolithic web browser application. Accordingly, the public API must be
comprehensive enough to use it as a component of any application. This
results in features that allow plugging into the web engine at different
layers to provide custom behaviour.
At Igalia we have years of experience embedding WebKit into all kinds of
applications, and we are always sympathetic to the needs of such systems. If
you are interested collaborating with WPE development, or searching for a
solution that can tightly integrate web content in your device, feel free to
contact us.

					This article was written by Adrián Pérez.

					I have been working on WebKit since 2012, with a focus on
					environment integration, embedding, and distribution. Igalia
					has been a life-long project since even earlier.