WASM kvvfs in browser extension's background service workers

(1) By Sergei Nikitin (sneakytin) on 2022-12-27 13:07:00 [source]

Hello, I am beyond excited by the work that has been done to bring SQLite to the browser, huge thank you to the people involved! I had a go at integrating it into a project of mine and just seeing it work was something else.

I understand that currently WASM builds are only intended to work in a browser - specifically either a web page environment or a "dedicated worker" spawned by one. My usecase stretches these boundaries a bit - I need to access SQLite from browser extension's background "service worker" which bumps into the well-documented limitations of persistent storage options.

The one limitation I'm most interested in right now are the kvvfs storages - they only work in a "main UI thread" that have access to things like Window.localStorage. Although extension's background workers don't have access to these APIs, they do have access to something very similar like browser.storage.local. That's also a key-value storage with differently named, but functionally identical APIs (e.g. get()/set() instead getItem()/setItem()). Would it be reasonable to contribute an ability to use these stores in addition to the existing kvvfs options?

I don't know what's the reliable way to detect at runtime if code is being executed in a browser extension environment, but there probably is one. Naively, the list of places that would have to be changed looks quite small, with only one location actually referencing the methods that differ in names between a web page and an extension.

Thank you!

Appendix 1 - anatomy of a browser extension

Just in case the reader is unfamiliar with browser extensions - a bare-bones TLDR. These days they consist of 3 main parts: "content" + "popup" (which sort of get executed in a web page env) and "background" (which is a single worker that is similar to the ones spawned via "new Worker()", but it has access to a different set of APIs and has a different lifecycle).

Appendix 2 - an edge of browser.storage over other kvvfs

I'm mainly interested in kvvfs like browser.storage.local because they work in the environment I need. But, unlike their Window.* equivalents, it is possible to bypass storage size limits in a standard cross-browser way by asking for a certain permission.

Appendix 3 - does OPFS work in a browser extension environment?

Right now - no. From the docs it's clear that in a browser environment OPFS is "the way to go" which should work within "dedicated workers", but "service worker" != "dedicated worker" (ref 1, ref 2) and certain APIs are missing. I don't have enough insight to say whether we can expect the missing APIs to eventually come to "service workers", but it seems that there is a good chance it'll eventually be possible to spawn "dedicated workers" from them which should unblock the use of worker API.

(2) By Stephan Beal (stephan) on 2022-12-27 13:36:15 in reply to 1 [link] [source]

The one limitation I'm most interested in right now are the kvvfs storages - they only work in a "main UI thread" that have access to things like Window.localStorage. Although extension's background workers don't have access to these APIs, they do have access to something very similar like browser.storage.local. That's also a key-value storage with differently named, but functionally identical APIs (e.g. get()/set() instead getItem()/setItem()). Would it be reasonable to contribute an ability to use these stores in addition to the existing kvvfs options?

It sounds like it would not be fundamentally a problem to support those. In the bootstrapping of the JS API we can check for the browser.storage.local object and install a proxy for that. It would mean that the pseudo-magical name "local" would refer to different storage in that context, and the name "session" would be invalid, but i don't see those as serious issues.

i will work on a patch for that after the current task is done but will need your help in testing it out. Please contact me off-list (stephan @ this domain) with your contact info so we can bounce that back and forth.

I don't know what's the reliable way to detect at runtime if code is being executed in a browser extension environment, but there probably is one.

If the "browser.storage" object is available to extensions, we can presumably simply look for the "browser" object and inspect it. The "browser" object is not defined in a typical web page context, so there would be no ambiguity there.

That said: this all assumes that the bootstrapping and WASM loading code installed by Emscripten can run in your environment. If it can't, you're out of luck. Though we do have an alternate WASM loader, we do not have a way to obtain and feed all of the necessary system-level APIs into the WASM module when it's loaded - that's a big part of what Emscripten does for us. Without those (many) APIs, the module cannot load.

Just in case the reader is unfamiliar with browser extensions

Entirely unfamiliar, and this is not a use case we've ever considered up to this point.

Appendix 3 - does OPFS work in a browser extension environment? ... which should work within "dedicated workers", but "service worker" != "dedicated worker"

We don't distinguish between types of workers: we simply inspect the environment for the appropriate APIs. If they are found, we use them and hope for the best.

According to your ref2: "Service Workers have a dedicated role of being a proxy between the network and the browser and/or cache." That seems to rule them out as being useful for OPFS, which is not a network service.

(3) By Sergei Nikitin (sneakytin) on 2022-12-27 14:49:06 in reply to 2 [link] [source]

i will work on a patch for that after the current task is done

That's very generous, thank you!

If the "browser.storage" object is available to extensions, we can presumably simply look for the "browser" object and inspect it.

That would work. There is some browser-specific ugliness to note though - I believe Firefox (and Edge?) references it as "browser.storage" while Chrome uses "chrome.storage". I'd have to do more tests. At a quick glance, it doesn't look like current JS code you have has to deal with any of it, and maybe it doesn't want to.

this all assumes that the bootstrapping and WASM loading code installed by Emscripten can run in your environment

Yes, it runs successfully in a browser extension environment and works, with some light, non-invasive massaging specific to the JS tooling my project uses for scaffolding. Although I was only able to test the 'memory' store up until now.

We don't distinguish between types of workers: we simply inspect the environment for the appropriate APIs.

Fair, especially since it yields a result that matches both OPFS docs & current sqlite docs.

(4) By Stephan Beal (stephan) on 2022-12-27 15:00:07 in reply to 1 [link] [source]

Although extension's background workers don't have access to these APIs, they do have access to something very similar like browser.storage.local.

Bad news: all of those APIs are asynchronous, which makes them unusable for our purposes.

https://developer.chrome.com/docs/extensions/reference/storage/#type-StorageArea

😢

Simulating a 100% synchronous interface on top of an asynchronous one unfortunately requires a huge amount of effort and only works from Worker threads because it requires use of Atomics.wait(), which is not legal from a main/UI thread. (That's how we implement the OPFS VFS proxy.) We cannot use the "await" keyword from synchronous code because "await" can only be used from functions tagged as "async", which implicitly changes their return values to Promises, and we cannot bind Promise-returning functions to C.

My apologies, but we won't be able to support that storage API with KVVFS.

(5) By Sergei Nikitin (sneakytin) on 2022-12-27 15:12:51 in reply to 4 [link] [source]

Ah right, I missed that it's quite impactful for your code :( Means web extension authors will have to wait until it's possible to launch Workers from them by which point the road to OPFS will be open and there will be little motivation to use kvvfs anyway.

Thank you for looking into this!

(6) By Stephan Beal (stephan) on 2022-12-27 15:33:31 in reply to 5 [link] [source]

Ah right, I missed that it's quite impactful for your code...

If they weren't async it would have taken less than 10 minutes of work to do, and i actually didn't notice that they were async until i was done writing the code and was about to start adding tests for it.

Async APIs exist to account for network latency. Why on earth browser developers insist on Asyncing All The Things, especially local-storage APIs which have no latency is truly a mystery.

If you can come up with a synchronous persistent storage option for browser extensions, we can certainly find a way to attach sqlite3 to it.

(7.1) By Sergei Nikitin (sneakytin) on 2023-01-24 13:02:56 edited from 7.0 in reply to 4 [link] [source]

I stumbled upon a an effort to standardise a (currently flagged, experimental) API to let WASM modules call JS async APIs syncronously -- they call it JSPI.

They talk about emscripten specifically in the post, so perhaps it could eventually make integrations like this cheap for sqlite.

(8) By Stephan Beal (stephan) on 2023-01-24 15:55:16 in reply to 7.1 [link] [source]

I stumbled upon a an effort to standardise a (currently flagged, experimental) API to let WASM modules call JS async APIs syncronously -- they call it JSPI.

i've discussed this with the Emscripten folks before and it appears to have a fatal flaw: though successful promises can be generically handled this way, there appears to be no way to handle failed promises because a generic rejection handler cannot possibly know what to return back to C in that case. Notice that the article does not make any mention of error handling or promise rejection. The words "error", "reject", and "fail" are nowhere to be found.

(9) By Sergei Nikitin (sneakytin) on 2023-04-07 10:34:25 in reply to 8 [link] [source]

This does indeed sound like a fatal flaw, so fatal in fact that it seems downright suspicious.

Based on what I can see, the expectation is that the C/C++ code will use EM_ASYNC_JS macro to wrap every async JS API. This macro expects a Javascript code block as its last input parameter. Indeed, I also failed to find any direct statements from JSPI creators on how rejected Promises will be handled. But even if there's truly nothing more to it, C/C++ code does have full control of the JS codeblock it passes to the macro. This means that it's trivial to ensure that the block will always resolve to a successful Promise. Return value and/or type of the C/C++ function will have to massaged accordingly of course, but still!

I'll try to illustrate my thoughts with an example, although it's all just based on public docs for Asyncify, so take this with a grain of salt as I'm not an expert in Emscripten.

// in JS
async function jsSometimesThrow() {
    if (Math.random() > 0) {
        return 42
    }
    throw new Error("oh no!")
}

Then a naive JSPI/Asyncify-ation of it would be

// in C
EM_ASYNC_JS(int, cFoo, (), {
  return await jsSometimesThrow()
});

I agree, it's unclear how cFoo() will behave if the implementation throws. However when a JS Promise is rejected, await just throws an exception, so it's easy to modify the C code to

// in C
EM_ASYNC_JS(int, cFoo, (), {
  try {
      return await jsSometimesThrow()
  }
  catch {
      return 777
  }
});

Now the behaviour of cFoo() is clear & deterministic. Depending on the JS API, the catch block may have to be more complex, it may influence the return type of cFoo() etc, but the end result is still the same -- C code can get the failure result in a form that C code needs.

(10) By Stephan Beal (stephan) on 2023-04-07 13:42:34 in reply to 9 [link] [source]

But even if there's truly nothing more to it, C/C++ code does have full control of the JS codeblock it passes to the macro.

Only if it uses that Emscripten-specific capability. Our C code is 100% Emscripten-free and we'd like to keep it that way. We're not anti-emscripten, but any sort of vendor-specific lock-in doesn't sit well with any of us (in particular a "moving target" like Emscripten).

Now the behaviour of cFoo() is clear & deterministic.

My understanding is that JSPI is intended to become a standard, in which case it cannot rely on C-side Emscripten macros. That approach certainly looks feasible for projects which don't self-impose a restriction against C-level dependency on Emscripten.

(11.1) By mlaw (tantaman) on 2023-04-09 11:21:31 edited from 11.0 in reply to 9 [link] [source]

Fwiw you could use wa-sqlite or cr-sqlite which both work in service workers and with async storage apis.

(12) By Stephan Beal (stephan) on 2023-04-09 13:52:47 in reply to 11.1 [link] [source]

Fwiw you could use wa-sqlite or cr-sqlite which both work in service workers and with async storage apis.

To add to that: the reason ours doesn't work out of the box with 3rd-party async APIs is because ours doesn't use the asyncify tool.

(13) By mlaw (tantaman) on 2023-04-09 14:56:48 in reply to 12 [link] [source]

There are also trade offs that come with asyncify I should mention. Slightly reduced Perf and increased binary size. The fact you can run it in the main thread, however, does negate some of the perf impact (of course with new trade offs).

The wa-SQLite repo has some good benchmarks between asyncify and not