Integrating WPE: URI Scheme Handlers and Script Messages

Most Web content is designed entirely for screen display—and there is a lot of it—so it will spend its life in the somewhat restricted sandbox implemented by a web browser. But rich user interfaces using Web technologies in all kinds of consumer devices require some degree of integration, an escape hatch to interact with the rest of their software and hardware. This is where a Web engine like WPE designed to be embeddable shines: not only does WPE provide a stable API, it is also comprehensive in supporting a number of ways to integrate with its environment further than the plethora of available Web platform APIs.

Integrating a “Web view” (the main entry point of the WPE embedding API) involves providing extension points, which allow the Web content (HTML/CSS/JavaScript) it loads to call into native code provided by the client application (typically written in C/C++) from JavaScript, and vice versa. There are a number of ways in which this can be achieved:

  • URI scheme handlers allow native code to register a custom URI scheme, which will run a user provided function to produce content that can be “fetched” regularly.
  • User script messaging can be used to send JSON messages from JavaScript running in the same context as Web pages to an user function, and vice versa.
  • The JavaScriptCore API is a powerful solution to provide new JavaScript functionality to Web content seamlessly, almost as if they were implemented inside the Web engine itself—akin to NodeJS C++ addons.

In this post we will explore the first two, as they can support many interesting use cases without introducing the additional complexity of extending the JavaScript virtual machine. Let’s dive in!

Intermission

We will be referring to the code of a tiny browser written for the occasion. Telling WebKit how to call our native code involves creating a WebKitUserContentManager, customizing it, and then associating it with web views during their creation. The only exception to this are URI scheme handlers, which are registered using webkit_web_context_register_uri_scheme(). This minimal browser includes an on_create_view function, which is the perfect place to do the configuration:

static WebKitWebView*
on_create_view(CogShell *shell, CogPlatform *platform)
{
    g_autoptr(GError) error = NULL;
    WebKitWebViewBackend *view_backend = cog_platform_get_view_backend(platform, NULL, &error);
    if (!view_backend)
        g_error("Cannot obtain view backend: %s", error->message);

    g_autoptr(WebKitUserContentManager) content_manager = create_content_manager();  /** NEW! **/
    configure_web_context(cog_shell_get_web_context(shell));                         /** NEW! **/
 
    g_autoptr(WebKitWebView) web_view =
        g_object_new(WEBKIT_TYPE_WEB_VIEW,
                     "user-content-manager", content_manager,  /** NEW! **/
                     "settings", cog_shell_get_web_settings(shell),
                     "web-context", cog_shell_get_web_context(shell),
                     "backend", view_backend,
                     NULL);
    cog_platform_init_web_view(platform, web_view);
    webkit_web_view_load_uri(web_view, s_starturl);
    return g_steal_pointer(&web_view);
}
What is g_autoptr? Does it relate to g_steal_pointer? This does not look like C!

In the shown code examples, g_autoptr(T) is a preprocessor macro provided by GLib that declares a pointer variable of the T type, and arranges for freeing resources automatically when the variable goes out of scope. For objects this results in g_object_unref() being called.

Internally the macro takes advantage of the __attribute__((cleanup, ...)) compiler extension, which is supported by GCC and Clang. GLib also includes a convenience macro that can be used to define cleanups for your own types.

As for g_steal_pointer, it is useful to indicate that the ownership of a pointer declared with g_autoptr is transferred outside from the current scope. The function returns the same pointer passed as parameter and resets it to NULL, thus preventing cleanup functions from running.

The size has been kept small thanks to reusing code from the Cog core library. As a bonus, it should run on Wayland, X11, and even on a bare display using the DRM/KMS subsystem directly. Compiling and running it, assuming you already have the dependencies installed, should be as easy as running:

cc -o minicog minicog.c $(pkg-config cogcore --libs --cflags)
./minicog wpewebkit.org

If the current session kind is not automatically detected, a second parameter can be used to manually choose among wl (Wayland), x11, drm, and so on:

./minicog wpewebkit.org x11

The full, unmodified source for this minimal browser is included right below.

Complete minicog.c source (Gist)

/*
 * SPDX-License-Identifier: MIT
 *
 * cc -o minicog minicog.c $(pkg-config wpe-webkit-1.1 cogcore --cflags --libs)
 */
 
#include <cog/cog.h>
 
static const char *s_starturl = NULL;
 
static WebKitWebView*
on_create_view(CogShell *shell, CogPlatform *platform)
{
    g_autoptr(GError) error = NULL;
    WebKitWebViewBackend *view_backend = cog_platform_get_view_backend(platform, NULL, &error);
    if (!view_backend)
        g_error("Cannot obtain view backend: %s", error->message);
 
    g_autoptr(WebKitWebView) web_view =
        g_object_new(WEBKIT_TYPE_WEB_VIEW,
                     "settings", cog_shell_get_web_settings(shell),
                     "web-context", cog_shell_get_web_context(shell),
                     "backend", view_backend,
                     NULL);
    cog_platform_init_web_view(platform, web_view);
    webkit_web_view_load_uri(web_view, s_starturl);
    return g_steal_pointer(&web_view);
}
 
int
main(int argc, char *argv[])
{
    g_set_application_name("minicog");
 
    if (argc != 2 && argc != 3) {
        g_printerr("Usage: %s [URL [platform]]\n", argv[0]);
        return EXIT_FAILURE;
    }
 
    g_autoptr(GError) error = NULL;
    if (!(s_starturl = cog_uri_guess_from_user_input(argv[1], TRUE, &error)))
        g_error("Invalid URL '%s': %s", argv[1], error->message);
 
    cog_modules_add_directory(COG_MODULEDIR);
 
    g_autoptr(GApplication) app = g_application_new(NULL, G_APPLICATION_DEFAULT_FLAGS);
    g_autoptr(CogShell) shell = cog_shell_new("minicog", FALSE);
    g_autoptr(CogPlatform) platform =
        cog_platform_new((argc == 3) ? argv[2] : g_getenv("COG_PLATFORM"), &error);
    if (!platform)
        g_error("Cannot create platform: %s", error->message);
 
    if (!cog_platform_setup(platform, shell, "", &error))
        g_error("Cannot setup platform: %s\n", error->message);
 
    g_signal_connect(shell, "create-view", G_CALLBACK(on_create_view), platform);
    g_signal_connect_swapped(app, "shutdown", G_CALLBACK(cog_shell_shutdown), shell);
    g_signal_connect_swapped(app, "startup", G_CALLBACK(cog_shell_startup), shell);
    g_signal_connect(app, "activate", G_CALLBACK(g_application_hold), NULL);
 
    return g_application_run(app, 1, argv);
}

URI Scheme Handlers

“Railroad” diagram of URI syntax
URI syntax (CC BY-SA 4.0, source), notice the “scheme” component at the top left.

A URI scheme handler allows “teaching” the web engine how to handle any load (pages, subresources, the Fetch API, XmlHttpRequest, …)—if you ever wondered how Firefox implements about:config or how Chromium does chrome://flags, this is it. Also, WPE WebKit has public API for this. Roughly:

  1. A custom URI scheme is registered using webkit_web_context_register_uri_scheme(). This also associates a callback function to it.
  2. When WebKit detects a load for the scheme, it invokes the provided function, passing a WebKitURISchemeRequest.
  3. The function generates data to be returned as the result of the load, as a GInputStream and calls webkit_uri_scheme_request_finish(). This sends the stream to WebKit as the response, indicating the length of the response (if known), and the MIME content type of the data in the stream.
  4. WebKit will now read the data from the input stream.

Echoes

Let’s add an echo handler to our minimal browser that replies back with the requested URI. Registering the scheme is straightforward enough:

static void
configure_web_context(WebKitWebContext *context)
{
    webkit_web_context_register_uri_scheme(context,
                                           "echo",
                                           handle_echo_request,
                                           NULL /* userdata */,
                                           NULL /* destroy_notify */);
}
What are “user data” and “destroy notify”?

The userdata parameter above is a convention used in many C libraries, and specially in these based on GLib when there are callback functions involved. It allows the user to supply a pointer to arbitrary data, which will be passed later on as a parameter to the callback (handle_echo_request in the example) when it gets invoked later on.

As for the destroy_notify parameter, it allows passing a function with the signature void func(void*) (type GDestroyNotify) which is invoked with userdata as the argument once the user data is no longer needed. In the example above, this callback function would be invoked when the URI scheme is unregistered. Or, from a different perspective, this callback is used to notify that the user data can now be destroyed.

One way of implementing handle_echo_request() could be wrapping the request URI, which is part of the WebKitURISchemeRequest parameter to the handler, stash it into a GBytes container, and create an input stream to read back its contents:

static void
handle_echo_request(WebKitURISchemeRequest *request, void *userdata)
{
    const char *request_uri = webkit_uri_scheme_request_get_uri(request);
    g_print("Request URI: %s\n", request_uri);

    g_autoptr(GBytes) data = g_bytes_new(request_uri, strlen(request_uri));
    g_autoptr(GInputStream) stream = g_memory_input_stream_new_from_bytes(data);

    webkit_uri_scheme_request_finish(request, stream, g_bytes_get_size(data), "text/plain");
}

Note how we need to tell WebKit how to finish the load request, in this case only with the data stream, but it is possible to have more control of the response or return an error.

With these changes, it is now possible to make page loads from the new custom URI scheme:

Screenshot of the minicog browser loading a custom echo:// URI
It worked!

Et Tu, CORS?

The main roadblock one may find when using custom URI schemes is that loads are affected by CORS checks. Not only that, WebKit by default does not allow sending cross-origin requests to custom URI schemes. This is by design: instead of accidentally leaking potentially sensitive data to websites, developers embedding a web view need to consciously opt-in to allow CORS requests and send back suitable Access-Control-Allow-* response headers.

In practice, the additional setup involves retrieving the WebKitSecurityManager being used by the WebKitWebContext and registering the scheme as CORS-enabled. Then, in the handler function for the custom URI scheme, create a WebKitURISchemeResponse, which allows fine-grained control of the response, including setting headers, and finishing the request instead with webkit_uri_scheme_request_finish_with_response().

Note that WebKit cuts some corners when using CORS with custom URI schemes: handlers will not receive preflight OPTIONS requests. Instead, the CORS headers from the replies are inspected, and if access needs to be denied then the data stream with the response contents is discarded.

In addition to providing a complete CORS-enabled custom URI scheme example, we recommend the Will It CORS? tool to help troubleshoot issues.

Further Ideas

Once we have WPE WebKit calling into our custom code, there are no limits to what a URI scheme handler can do—as long as it involves replying to requests. Here are some ideas:

  • Allow pages to access a subset of paths from the local file system in a controlled way (as CORS applies). For inspiration, see CogDirectoryFilesHandler.
  • Package all your web application assets into a single ZIP file, making loads from app:/... fetch content from it. Or, make the scheme handler load data using GResource and bundle the application inside your program.
  • Use the presence of a well-known custom URI to have a web application realize that it is running on a certain device, and make its user interface adapt accordingly.
  • Provide a REST API, which internally calls into NetworkManager to list and configure wireless network adapters. Combine it with a local web application and embedded devices can now easily get on the network.

User Script Messages

While URI scheme handlers allow streaming large chunks of data back into the Web engine, for exchanging smaller pieces of information in a more programmatic fashion it may be preferable to exchange messages without the need to trigger resource loads. The user script messages part of the WebKitUserContentManager API can be used this way:

  1. Register a user message handler with webkit_user_content_manager_register_script_message_handler(). As opposed to URI scheme handlers, this only enables receiving messages, but does not associate a callback function yet.
  2. Associate a callback to the script-message-received signal. The signal detail should be the name of the registered handler.
  3. Now, whenever JavaScript code calls window.webkit.messageHandlers.<name>.postMessage(), the signal is emitted, and the native callback functions invoked.
Haven't I seen postMessage() elsewhere?

Yes, you have. The name is the same because it provides a similar functionality (send a message), it guarantees little (the receiver should validate messages), and there are similar restrictions in the kind of values that can be passed along.

It’s All JavaScript

Let’s add a feature to our minimal browser that will allow JavaScript code to trigger rebooting or powering off the device where it is running. While this should definitely not be functionality exposed to the open Web, it is perfectly acceptable in an embedded device where we control what gets loaded with WPE, and that exclusively uses a web application as its user interface.

Pepe Silvia conspiracy image meme, with the text “It's all JavaScript” superimposed
Yet most of the code shown in this post is C.

First, create a WebKitUserContentManager, register the message handler, and connect a callback to its associated signal:

static WebKitUserContentManager*
create_content_manager(void)
{
    g_autoptr(WebKitUserContentManager) content_manager = webkit_user_content_manager_new();
    webkit_user_content_manager_register_script_message_handler(content_manager, "powerControl");
    g_signal_connect(content_manager, "script-message-received::powerControl",
                     G_CALLBACK(handle_power_control_message), NULL);
    return g_steal_pointer(&content_manager);
}

The callback receives a WebKitJavascriptResult, from which we can get the JSCValue with the contents of the parameter passed to the postMessage() function. The JSCValue can now be inspected to check for malformed messages and determine the action to take, and then arrange to call reboot():

static void
handle_power_control_message(WebKitUserContentManager *content_manager,
                             WebKitJavascriptResult *js_result, void *userdata)
{
    JSCValue *value = webkit_javascript_result_get_js_value(js_result);
    if (!jsc_value_is_string(value)) {
        g_warning("Invalid powerControl message: argument is not a string");
        return;
    }

    g_autofree char *value_as_string = jsc_value_to_string(value);
    int action;
    if (strcmp(value_as_string, "poweroff") == 0) {
        action = RB_POWER_OFF;
    } else if (strcmp(value_as_string, "reboot") == 0) {
        action = RB_AUTOBOOT;
    } else {
        g_warning("Invalid powerControl message: '%s'", value_as_string);
        return;
    }

    g_message("Device will %s now!", value_as_string);
    sync(); reboot(action);
}

Note that the reboot() system call above will most likely fail because it needs administrative privileges. While the browser process could run as root to sidestep this issue—definitely not recommended!—it would be better to grant the CAP_SYS_BOOT capability to the process, and much better to ask the system manager daemon to handle the job. In machines using systemd a good option is to call the .Halt() and .Reboot() methods of its org.freedesktop.systemd1 interface.

Now we can write a small HTML document with some JavaScript sprinkled on top to arrange sending the messages:

<html>
  <head>
    <meta charset="utf-8" />
    <title>Device Power Control</title>
  </head>
  <body>
    <button id="reboot">Reboot</button>
    <button id="poweroff">Power Off</button>
    <script type="text/javascript">
      function addHandler(name) {
        document.getElementById(name).addEventListener("click", (event) => {
          window.webkit.messageHandlers.powerControl.postMessage(name);
          return false;
        });
      }
      addHandler("reboot");
      addHandler("poweroff");
    </script>
  </body>
</html>

The complete source code for this example can be found in this Gist.

Going In The Other Direction

But how can one return values from user messages back to the JavaScript code running in the context of the web page? Until recently, the only option available was exposing some known function in the page’s JavaScript code, and then using webkit_web_view_run_javascript() to call it from native code later on. To make this more idiomatic and allow waiting on a Promise, an approach like the following works:

  1. Have convenience JavaScript functions wrapping the calls to .postMessage() which add an unique identifier as part of the message, create a Promise, and store it in a Map indexed by the identifier. The Promise is itself returned from the functions.
  2. When the callback in native code handle messages, they need to take note of the message identifier, and then use webkit_web_view_run_javascript() to pass it back, along with the information needed to resolve the promise.
  3. The Javascript code running in the page takes the Promise from the Map that corresponds to the identifier, and resolves it.

To make this approach a bit more palatable, we could tell WebKit to inject a script along with the regular content, which would provide the helper functions needed to achieve this.

Nevertheless, the approach outlined above is cumbersome and can be tricky to get right, not to mention that the effort needs to be duplicated in each application. Therefore, we have recently added new API hooks to provide this as a built-in feature, so starting in WPE WebKit 2.40 the recommended approach involves using webkit_user_content_manager_register_script_message_handler_with_reply() to register handlers instead. This way, calling .postMessage() now returns a Promise to the JavaScript code, and the callbacks connected to the script-message-with-reply-received signal now receive a WebKitScriptMessageReply, which can be used to resolve the promise—either on the spot, or asynchronously later on.

Even More Ideas

User script messages are a powerful and rather flexible facility to make WPE integrate web content into a complete system. The provided example is rather simple, but as long as we do not need to pass huge amounts of data in messages the possibilities are almost endless—especially with the added convenience in WPE WebKit 2.40. Here are more ideas that can be built on top of user script messages:

  • A handler could receive requests to “monitor” some object, and return a Promise that gets resolved when it has received changes. For example, this could make the user interface of a smart thermostat react to temperate updates from a sensor.
  • A generic handler that takes the message payload and converts it into D-Bus method calls, allowing web pages to control many aspects of a typical Linux system.

Wrapping Up

WPE has been designed from the ground up to integrate with the rest of the system, instead of having a sole focus on rendering Web content inside a monolithic web browser application. Accordingly, the public API must be comprehensive enough to use it as a component of any application. This results in features that allow plugging into the web engine at different layers to provide custom behaviour.

At Igalia we have years of experience embedding WebKit into all kinds of applications, and we are always sympathetic to the needs of such systems. If you are interested collaborating with WPE development, or searching for a solution that can tightly integrate web content in your device, feel free to contact us.


Head shot of Adrián Pérez This article was written by Adrián Pérez.
I have been working on WebKit since 2012, with a focus on environment integration, embedding, and distribution. Igalia has been a life-long project since even earlier.

If you’re using WPE WebKit, or are considering doing so, please take our brief user survey. Your input will help us make WPE WebKit better for you!

If you’re using WPE WebKit, or are considering doing so, please take our brief user survey! Your input will help us make WPE WebKit better for you.