SQLite Forum

Announcement and RFC: JS API features
Login

Announcement and RFC: JS API features

(1) By Stephan Beal (stephan) on 2022-12-09 03:43:13 [link] [source]

Good early morning, all,

As most of you are certainly aware, the sqlite project recently dipped its toes into the previously unthinkable (for these old C coders) realm of JavaScript:

wasm:/

(Scary, right? What's the world coming to?)

With this week's addition of support for creating virtual tables and table-valued functions in JS, the documentation has passed the arbitrary 100-page milestone1. The API is reaching the point where thoughts are turning to the question, "what APIs are missing (which people will genuinely need in JS)?"

Notable TODOs in that regard include:

  • Authorizer support. That might seem silly for a client-side db in a browser, but some applications (like the Fossil SCM) make use of authorizers to help ensure that the application doesn't end up hosing its own database through unintentional misbehavior.
  • Custom collations are in place but need a small amount of tweaking to make them simpler to use from JS.

But... what else? That is: what features would you, as an intrepid JavaScript programmer of next-generation client-side web applications, like to see which are currently missing? As a basis for comparison, the full list of WASM-exported C APIs can be seen at:

wasm:/doc/trunk/module-symbols.html (Requires a WASM-capable browser)

and the current list of TODOs is summarized at:

wasm:/doc/trunk/todo.md

Prerelease snapshots of the JS/WASM pieces are uploaded to that site as new features are added and bugs are fixed, so there's no need to wait on the 3.41 release to try them out. In fact, earlier is better, as we'd like to cement most of the APIs in place with 3.41, giving them the same strong API-stability guarantees the C API enjoys.

It's long been my experience that the best feature suggestions come from users of a project other than its own developers, and your feedback would be valuable in shaping the direction of this side project. Feel free to post it here in the forum or, if you're feeling shy, send it to me off-list at gro.etilqs@nahpets.

Thank you for your time!


  1. ^ Counting 66 lines per page, dropping to about 80 pages if we strip all blank lines before counting.

(2) By jose isaias cabrera (jicman) on 2022-12-09 13:33:08 in reply to 1 [link] [source]

Thank you for this work, Stephan. Continue on...

(3) By Alex MacCaw (maccman) on 2022-12-12 09:44:40 in reply to 1 [link] [source]

Thank you for your incredible work on this.

This might not be in the letter of your question but it is hopefully in the spirit.

The most important things to make this extremely useful and adopted across the web are in my opinion:

Speed of read/writes. This needs to be an order of magnitude faster than IndexedDb. I suspect you've already accomplished this, but could be useful to benchmark.

Persistence. I know that Chrome's new APIs make file system persistence possible, but I think it's really important that we have an IndexedDb (or equiv) fallback. Case in point, my company Reflect has a web app, a desktop app (via Electron), and an iOS app that is actually just a wrapped web app (using Capacitor). We only control the browser in one of these cases: the Electron desktop app. So in order to use client-side Sqlite we need persistence fallbacks for the other platforms.

Thank you again!

(4) By Stephan Beal (stephan) on 2022-12-12 10:32:52 in reply to 3 [link] [source]

This might not be in the letter of your question but it is hopefully in the spirit.

Absolutely.

Speed of read/writes. This needs to be an order of magnitude faster than IndexedDb. I suspect you've already accomplished this, but could be useful to benchmark.

We had strong performance goals in mind from the start and spent a great deal of time optimizing the OPFS-related VFS to perform well, to the point that it's competitive with WebSQL (sometimes faster, sometimes slower, depending on the workload), noting that WebSQL is implemented in native code and ours is pure JS.

We did not benchmark against IndexedDb because, frankly, IndexedDb doesn't really concern us. It's a whole different beast with a whole different API, and any sort of comparisons would be apples vs. oranges. We experimented only very briefly (a single afternoon) with using IDB as backing store for an sqlite db, but we didn't like where that was headed. That approach simply doesn't sit well with any of us.

The speed of non-persistent I/O is essentially limited to that of the underlying JS and WASM engines. All we can really do there, in terms of influencing the speed, is compile at different optimization levels. (Compiling with -O2 consistently gives faster results than any other level.)

We don't have any truly user-friendly benchmarking tools, as we just use them during testing and development, but they're accessible at:

https://wasm-testing.sqlite.org/

In particular, the "speedtest1" tools are our basis for benchmarking because they're WASM builds of the C library's primary benchmarking tool. Using that, we can do direct comparisons against "native" (out-of-browser) speeds.

Persistence. I know that Chrome's new APIs make file system persistence possible, but I think it's really important that we have an IndexedDb (or equiv) fallback.

cough localStorage cough sessionStorage cough

An IDB-based backend is not currently in the cards for this project. There are other projects which store sqlite in IDB:

This is not to be misunderstood as an official project-level stance on the topic, but i'm personally in no way keen on supporting IDB. That said, my strongly negative opinion on IDB is admittedly based mostly on the anecdotes of others, rather than personal experience, and counter-anecdotes would be dutifully considered1. Even so, the list of JS/WASM TODOs and want-to-dos is long enough that, even if someone were to convince us that an IDB-based implementation was worth the effort of development and supporting it forever, it's unlikely that it would actually get done before the middle of 2023. By that time, the non-Chromium browsers will certainly (we can hope) have caught up with OPFS, making IDB a non-topic for us.


  1. ^ This article, in particular, has shaped my opinions on IDB

(5) By mlaw (tantaman) on 2022-12-12 14:09:12 in reply to 1 [link] [source]

1 - I've mentioned this before but: support for many tabs interacting with the same database. I know you've made significant progress here but I'm not sure what the current state is.

2 - Run time loadable extensions for WASM. It seems that any SQLite extension you'd like to use in WASM has to be statically linked at build time.

If (2) isn't possible then some examples of how to statically link and compile, to WASM, extensions that are not written in C. As an example, we're currently trying to figure out how to make Rust extensions play well with SQLite WASM -- https://github.com/asg017/sqlite-base64/issues/1

(6) By Stephan Beal (stephan) on 2022-12-12 15:16:46 in reply to 5 [link] [source]

1 - I've mentioned this before but: support for many tabs interacting with the same database. I know you've made significant progress here but I'm not sure what the current state is.

"Many" seems unlikely to happen any time soon in a browser environment, given OPFS's current locking limitations. Roy H. has done some intriguing work with Web Locks in his wa-sqlite project, but whether or not we can retain our current performance-related goals if we take a similar route is still unknown and a pending cross-country move will impede any experimentation with that through at least late Q1 2023.

The current "state of the art," using the default VFS settings, is that we can consistently get about 3 OPFS connections to the same database without contention if they are careful to perform work in small chunks.

However, we also have a so-called "unlock-asap" VFS flag which can approximately double that amount at the cost of a significant performance hit (as much as 400% in I/O-intensive benchmarks but essentially negligible for I/O-light workloads). The most connections i've witnessed run ob a single OPFS-hosted db without contention errors is 8, each one performing a small update at 1-second intervals for about 100 seconds. Frankly, though, i suspect that was a fluke. Anything more than 6-ish (with unlock-asap mode) is likely asking for eventual problems.

Relevant links:

Run time loadable extensions for WASM. It seems that any SQLite extension you'd like to use in WASM has to be statically linked at build time.

AFAIK, dlopen() is not currently possible in browser-side WASM, but if you have a way to do this then i'm all ears. It would be a nice capability.

If (2) isn't possible then some examples of how to statically link and compile, to WASM, extensions that are not written in C.

Such know-how is unfortunately currently well above my proverbial pay grade :/. This is a JS-and-C project and any creation of non-C/non-JS extensions is currently way out of our scope. That's not to rule it out as an eventual point of experimentation, but not in the immediate future.

(7) By anonymous on 2022-12-12 19:34:47 in reply to 4 [link] [source]

Figuring out whether/how it's possible to import wa-sqlite's VFSes into client-level code which uses our API is on my TODO list.

Not saying this wouldn't be great because it would be. However, just providing a minimal Javascript-only VFS to use as a template would be less work for you and still quite useful IMO. Currently kvvfs isn't Javascript only and opfs is a lot to wade through. A sample VFS that uses ArrayBuffer for storage - with no SharedArrayBuffer/Atomics, no automatic registration, etc. - would be a much easier starting point for alternative backends.

(8) By Stephan Beal (stephan) on 2022-12-12 20:22:37 in reply to 7 [link] [source]

However, just providing a minimal Javascript-only VFS to use as a template would be less work for you and still quite useful IMO.

That's actually something i considered last week but a VFS is always specific to some particular storage. The JS API is as close to the C API as it can be primarily so that we do not have to duplicate any documentation or example code efforts specifically for JS. VFSes, given their overall complexity, are not well-suited to a three-step tutorial. The existing C-based docs and demo code will have to suffice for those for the time being.

Your example was one i considered last week...

A sample VFS that uses ArrayBuffer for storage - with no SharedArrayBuffer/Atomics, no automatic registration, etc. - would be a much easier starting point for alternative backends.

i've yet to find a practical use for such a demo VFS, so am hesitant to invest any time in it given the many current competing priorities. We already have in-memory databases which perform faster that any ArrayBuffer-backed one would and we have sqlite3_js_db_export() with which to convert them to JS byte arrays for storage. Every VFS is necessarily tied so some storage or other (be it virtual or not) and all(?) of the existing bases seem to be covered reasonably well. It would, however, be ideal if VFSes built using such tools could be plugged in with our API. Exploring that is somewhere in the double-digit entries of the TODOs, though.

(9) By Alex MacCaw (maccman) on 2022-12-13 06:08:58 in reply to 4 [link] [source]

I totally understand where you're coming from regarding IDB. I share a similar disdain, which is why I'm so excited about having alternatives :)

Unfortunately localStorage and sessionStorage won't cut it for us (or maybe for anyone), because of the 5mb storage limit.

I really hope you're right about non-chromium support for OPFFS, but I suspect it's going to take a LONG time for iOS Safari to support OPFFS if they ever do. I don't believe Apple has shown any recognition of support and, in general, they seem committed to handicapping web apps on iOS Safari so to not cannibalize their app-store revenue.

Will dig into wa-sqlite VFSes - thanks for the pointer.

(10) By Stephan Beal (stephan) on 2022-12-13 07:32:54 in reply to 9 [link] [source]

Unfortunately localStorage and sessionStorage won't cut it for us (or maybe for anyone), because of the 5mb storage limit.

Richard created the VFS for those on a lark, mainly to see peoples' eyes bug out, but they are a real thing and do work. Very limited in size and not as durable (in the ACID sense) as most other storage, but suitable for small use cases.

but I suspect it's going to take a LONG time for iOS Safari to support OPFFS if they ever do.

My (mis?)understanding, going solely from memory of reading some of Roy Hashimoto's OPFS-related articles, is that Safari has OPFS support in an older form, without the so-called sync access handles (which we require). That might be me misremembering, though. Firefox has an active ticket open for it and they've been making progress recently.

My optimistic hypothesis is that some truly exciting web apps will be created for OPFS+sqlite in 2023 and that that will light a proverbial fire under Safari and Firefox to get them to catch up :).

Will dig into wa-sqlite VFSes - thanks for the pointer.

He's done a lot of work on the topic and understands the subtleties VFSes far, far better than i do (the OPFS VFS was my first one ever).

(11) By anonymous on 2022-12-13 15:42:01 in reply to 10 [link] [source]

Safari has supported OPFS sync access handles for about a year (which was ~6 months earlier than Chrome stable), i.e. with the API that SQLite WASM is using. They don't yet have the recent spec changes to make all the methods on sync access handles synchronous that just went stable in Chrome last month but almost no one is using that yet.

Firefox hasn't shipped OPFS support yet but will have the updated sync access handle API when they do.

(12) By Stephan Beal (stephan) on 2022-12-13 16:03:01 in reply to 11 [link] [source]

Safari has supported OPFS sync access handles for about a year

We had a meeting with several Safari folks late last Summer and they gave the impression that there was an important (for us) part of the OPFS API which they didn't have yet and weren't sure they would have, but i can't for the life of me recall what it was. If that's not/no longer the case, then great.

They don't yet have the recent spec changes to make all the methods on sync access handles synchronous that just went stable in Chrome last month but almost no one is using that yet.

We're still applying "await" to those newly-updated methods and will be for the foreseeable future to avoid breaking Chrome versions older than 108. "Someday" we'll remove "await" from those, but there's no pressing need to because "await" is a no-op on non-async functions.

Unfortunately, Chrome is not making all the methods synchronous: opening a sync access handle is (ironically) still asynchronous. If they would make all of the API synchronous we could gain 30-odd percent performance by eliminating the cross-thread communication parts which we currently require in order to hide the async parts behind the synchronous sqlite3 API. The last time it was measured, roughly mid-30s percent of the run-time was spent waiting at the cross-thread communication boundaries. As long as even a single method is asynchronous, however, the "viral" nature of JS's async/await keywords forces us to keep the OPFS-using parts in a separate thread from the sqlite3-speaking parts :(.

(13.2) By mlaw (tantaman) on 2022-12-15 11:51:27 edited from 13.1 in reply to 1 [link] [source]

Smaller binary size :)

Any reason that wasm-opt is currently not run against the sqlite build? It reduces the final wasm size from 3.2 to 1.3mb. Compressing that resulting wasm-opt binary via brotli gets you to 400kb.

(14) By mlaw (tantaman) on 2022-12-15 11:53:46 in reply to 6 [link] [source]

Looks like dlopen is supported in browser-side WASM: https://emscripten.org/docs/compiling/Dynamic-Linking.html#runtime-dynamic-linking-with-dlopen

I've made progress on statically linking Rust extensions to WASM but since I've discovered runtime linking via emscripten I'll probably switch to trying to get that working.

(15) By Stephan Beal (stephan) on 2022-12-15 13:14:11 in reply to 13.2 [link] [source]

Any reason that wasm-opt is currently not run against the sqlite build? It reduces the final wasm size from 3.2 to 1.3mb.

Our release builds are currently a tick over 800kb, roughly half that gzipped. The 3mb builds use the default/fast build flags because it's too painful to wait 2 full minutes a hundred times a day for each build between edits. As to the precise invocation of tools, we currently leave that to emcc with one exception documented at length in the makefile.

Run "make oz" to get the smallest builds and "make o2" to get the ones with the fastest runtime speeds. We use "oz" for all releases and snapshot builds.

Compressing that resulting wasm-opt binary via brotli gets you to 400kb.

Compression is entirely up to the hosting service and ours don't support brotli.

(16) By Stephan Beal (stephan) on 2022-12-15 13:26:53 in reply to 14 [link] [source]

Looks like dlopen is supported in browser-side WASM: https://emscripten.org/docs/compiling/Dynamic-Linking.html#runtime-dynamic-linking-with-dlopen

This part might make that a no-go:

... instead, you must load the side module into the filesystem, so that dlopen (or fopen, etc.) can access it (except for dlopen(NULL) which means to open the current executable, which just works without filesystem integration).

Except for the fiddle app, which currently still relies on the emscripten FS API for one feature, we very specifically do not use it in the main library in the interest of eventually weaning ourselves from dependency on any one specific toolchain (a topic very near and dear to my heart).

i'll look into dlopen support but make no promises because anything which ties the canonical builds even closer to emscripten-specific capabilities simply won't sit well with me. Demonstrating it as an option for those who don't mind that dependency is something we can do, but not immediately due to a pending cross-country move, followed by house hunting, on my end.

(17) By mlaw (tantaman) on 2022-12-19 14:11:00 in reply to 16 [source]

afaik, it only uses the fs api so dlopen can work as normal (by passing a file path).

I know you're not planning on working on this so I don't expect an answer but just as an update --

I got runtime loadable extensions working in the browser with wa-sqlite with some work. Was even able to load and run extension written in other languages at runtime. I figured applying the same techniques to the official sqlite build would just work but, to my surprise, there's a different set of issues that came up.

The first problem was that wasm-strip removes the dylink table. There's probably a flag to prevent that from being stripped, currently just disabled it in my build for the time being.

The next is an issue with jaccwabyt's use of __indirect_function_table. When adding the -sMAIN_MODULE=2 emcc flag to SQLite's wasm build, this table no longer exists for reasons currently unknown to me.

Random nits:

  • The sed command in GNUMakefile needs a ; after the /d to work on BSD like systems (/^export default/d;)
  • The test-opfs-vfs.html example is making a call to instanceForPointer which was removed in https://www.sqlite.org/cgi/src/info/a190abc307847174

(18) By Stephan Beal (stephan) on 2022-12-19 14:39:57 in reply to 17 [link] [source]

afaik, it only uses the fs api so dlopen can work as normal (by passing a file path).

i talked with the Emscripten devs about that a few days ago. They explained that one major limitation with dlopen() is that the loading module must provide any and all exported symbols for the dll. If a module only relies on sqlite3 APIs which have already been exported, it "should work," but as soon as they rely on anything else, whether it be non-exports sqlite3 APIs or system-level APIs which are not linked in to/exported from sqlite3.wasm, it won't work.

The first problem was that wasm-strip removes the dylink table.

That is interesting. Perhaps there's a wasm-strip flag to tell it not to do that.

Some background on why we need wasm-strip: Emscripten assumes that all code using the module is part of that same module, and minifies all code in the module by default. When all code is in the module, that's fine. As soon as code outside of the module wants to access it (as is the case in all libraries), their code minification breaks everything - it's impossible to access the library-level code. The workaround is to build with -g3, which disables all minification. After that, we have to wasm-strip to get the wasm file down to a manageable size.

i've spoken to the Emscripten devs about this particular quirk of the build but they couldn't offer a workaround. They're now aware of the problem, though, and might (might) offer a way to disable JS symbol minification in non-g3 builds at some point. Right now, -g3 + wasm-strip is the only way for us to build a working JS/wasm combination which isn't 3+MB.

That same build quirk is why we cannot use the -sSINGLE_FILE option, as we have no opportunity to strip the wasm file before it gets embedded in the JS file, leaving us with a 5MB JS file.

The next is an issue with jaccwabyt's use of __indirect_function_table. When adding the -sMAIN_MODULE=2 emcc flag to SQLite's wasm build, this table no longer exists for reasons currently unknown to me.

That's also new to me. We need the indirect function table in order to support UDFs and binding of JS functions via callbacks, e.g. for sqlite3_exec(). That's a standard feature of wasm (though its name is only semi-standard), so why it would be removed is a mystery to me.

Looking at the description of MAIN_MODULE=2 in their docs, it doesn't sound like something we'll be able to support. (There are many of their options we cannot support, for that matter, e.g. a fixed function table size, and that's an unavoidable fact of life.)

The sed command in GNUMakefile needs a ; after the /d to work on BSD like systems (/^export default/d;)

Thank you, i'll patch that. The need for sed is unfortunate, and i've filed an RFE with Emscripten which would let us eliminate that requirement, but we currently don't have any other way of removing their automatic "export default".

The test-opfs-vfs.html example is making a call to instanceForPointer which was removed...

That one hasn't been used in a while. i'll remove it entirely unless it's part of the dist builds, in which case i'll remove that part of it.

(19) By mlaw (tantaman) on 2022-12-19 15:21:43 in reply to 18 [link] [source]

If a module only relies on sqlite3 APIs which have already been exported, it "should work," but as soon as they rely on anything else, whether it be non-exports sqlite3 APIs or system-level APIs which are not linked in to/exported from sqlite3.wasm, it won't work.

Thanks for mentioning this. I did notice this but didn't pay much attention at the time. In other words, extensions I brought in were requiring me to export extra symbols from the core sqlite build that were not sqlite related functions. Yeah... this just isn't going to work and it might be time to end my experiments.

I'll post a writeup on vlcn.io somewhere in case some future person wants to continue this sort of work.

(20) By Stephan Beal (stephan) on 2022-12-19 15:24:18 in reply to 19 [link] [source]

I'll post a writeup on vlcn.io somewhere in case some future person wants to continue this sort of work.

Please post the link when you do and i'll post it somewhere in the /wasm docs, either "related works" or maybe a new page (perhaps you can recommend a subject under which to wrap such stuff).