wasm - tree-shakable modules for bundlers

(1) By randName on 2024-01-03 12:19:20 [source]

I have been working on the JS side of the WASM library, and I think I've got a sufficient grasp of the parts involved to make a tree-shakable version (or at least start on it)

First I will need to outline some of the constraints/goals:

use the "official" .wasm build (at sqlite/sqlite-wasm) as-is, no building C
types via JSDoc and .d.ts files (there should not be a need to do transpiling, though development will be using Vite)
minimal function overloads (it looks like a lot of code is just to support different call signatures)
use the OPFS (via SyncAccessHandle) for VFS by default (if launched in a worker), but not through the oo SAHPool API

The main idea is that if you only import the functions to open a singleton DB and exec a few statements, your final code should not need to expose any of the classes. or even more granularly, if you only retrieve your rows as arrays, then you would not need the function to get column names.

Eventually there should be a way to integrate with the OO APIs, and the library can definitely export something that mimics the current sqlite3 object from the init function (with all the relevant convenience code for supporting different call signatures). But if you import specific functions (that might be a bit more picky with their arguments) then those helpers won't even touch your final build.

Now the main question: What are the chances of integrating my changes upstream? (specifically to the GitHub repository - I understand the public domain related restrictions of the main source). I won't have the capacity to fully develop and maintain a fork given my dayjob, and I'd rather the benefits (of a smaller build) to be available to more people. Also, there are only a limited set of features I'm interested in using, so I'm unable to fully check if my refactoring breaks anything (that said, is there a public test suite for the JS code? or is it also closed like the main tests?)

The work in progress is currently at my fork and I am developing on stackblitz so it is very messy at the moment.

(2) By Stephan Beal (stephan) on 2024-01-03 12:38:50 in reply to 1 [link] [source]

Now the main question: What are the chances of integrating my changes upstream? (specifically to the GitHub repository - I understand the public domain related restrictions of the main source).

That's largely up to Thomas Steiner, who maintains the npm-related pieces on this project's behalf.

Background: as nobody in this project uses node.js in any capacity, we can't rightfully claim to support. Our tiny team lacks the bandwidth to keep up with the ever-changing node-related ecosystem and to make an honest effort to support everyone's favorite frameworks. Thus we very intentionally restrict "this side" of the project to plain-vanilla JS, free of all dependencies on 3rd-party tools and frameworks with the obligatory exception of requiring Emscripten to build it. Because npm is extremely popular, however, and we've had numerous requests to support it, we took the measure of handing that part of the maintenance to someone who is well-versed in that side of the JS ecosystem. Though the npm build and related pieces are maintained outside of this project's source tree, it does indeed fall under the umbrella of this project and is an "official" sub-project. We actively follow the npm project and participate in the github issue tracker, but do not take active part in that side of the development.

(that said, is there a public test suite for the JS code? or is it also closed like the main tests?)

All of the test code for the canonical build is in the public sqlite source tree but none of it is npm-/node-based - we deal only in plain-vanilla JS in this tree. To the best of my limited knowledge there is no test suite specifically for the npm build, but Thomas would know better about that.

(3) By randName on 2024-01-03 13:05:07 in reply to 2 [link] [source]

plain-vanilla JS, free of all dependencies

I understand the plain JS requirement, and actively avoid dependencies if possible (hence JSDoc types). The only requirement is a current browser (due to OPFS and ESM). To clarify, my current code is able to be used as-is (as long as it is from a proper server which handles mime-types properly, since that is what is needed for WASM files anyway), without any npm or node involved.

Vite is merely a development tool which has some nice features for development, e.g. hot-reloading. But definitely not a requirement to get any work done

(4) By Stephan Beal (stephan) on 2024-01-03 13:34:39 in reply to 2 [link] [source]

Though the npm build and related pieces are maintained outside of this project's source tree, it does indeed fall under the umbrella of this project and is an "official" sub-project.

To clarify, though: that does not mean that we police what goes on in, or gets checked in to, the github-hosted npm-related repo. Thomas has free reign there and is welcome to accept whatever community contributions he likes. We support him where we can but leave all of the decision-making to him.

The core source tree, however, has strict requirements limiting checked-in content to that created/written by a team member.

(5) By Daniel Steigerwald (steida) on 2024-01-06 21:59:25 in reply to 4 [link] [source]

As I see it, the SQLite team (you) could maintain simple, low-level, dependency-free functions that can be used by third parties to build higher-level APIs where tree shaking and other stuff can be implemented. JavaScript class is not three-shakeable; functions are.

(6) By Stephan Beal (stephan) on 2024-01-07 09:35:24 in reply to 5 [link] [source]

As I see it, the SQLite team (you) could maintain simple, low-level, dependency-free functions that can be used by third parties to build higher-level APIs

We publish that - the direct exports from the wasm file, devoid of all type conversion and such, are part of the public API (with a handful of obligatory exceptions which are glue-level implementation details). The wrappers with argument/result type conversion are necessarily not dependency-free - they rely on our glue code suite to create those type-converting wrappers.

Or am i misunderstanding what you mean by "dependency-free"?

(19) By randName on 2024-01-10 17:16:14 in reply to 6 [link] [source]

also, after reading a few older threads I suddenly remembered something I wanted to ask that is unrelated to the build stuff but related to the original topic

is there a reason for the very relaxed argument handling of the JS functions? Accepting different types for a parameter is ok (e.g. converting a oo1.DB to its pointer), but I see that some functions accept reversed arguments (e.g. jsFuncToWasm), and there are a few helper functions just to normalize function args. Does that not defeat the purpose of the documented function signature?

(21) By Stephan Beal (stephan) on 2024-01-10 17:37:13 in reply to 19 [link] [source]

is there a reason for ... some functions accept reversed arguments (e.g. jsFuncToWasm), and there are a few helper functions just to normalize function args.

That particular case was a fluke of evolution. In some call contexts one form is more readable than the other, so it was made to support both. The effects that would have on downstream tools literally never crossed my mind at the time.

In hindsight, definitely not a great approach, but we're now stuck with it.

To the best of my recollection, none of the sqlite3_xyz() functions have that particular quirk - only the glue code. The "canonical APIs" (those derived directly from the C API) are all relatively strict, at least in their argument count, and most will throw if they're not provided the exact same number of arguments as their C counterparts. Where bogus types are passed in, we generally let them implicitly convert to integer on their way to WASM (0, in most cases).

Does that not defeat the purpose of the documented function signature?

It defeats for-tools-consumption declarations, perhaps, but the docs are written for human consumption and it doesn't defeat those. Tool-consumed declarations and docs were literally not on the radar at that phase of the development. Had they been, the oddball signatures might have been done differently. Every byte of this project's JS code was written in emacs, without any support from high-level IDEs/tools, and that admittedly has had some subtle unforeseen side effects vis a vis the downstream tools most JS devs use.

(22) By randName on 2024-01-11 00:57:06 in reply to 21 [link] [source]

a fluke of evolution

funnily enough that's the same story with the Recurrent nerve. joke aside I'm glad it wasn't a core decision and won't propagate further

stuck with it

not sure about that since there were client-breaking api changes before, so I don't see why breaking internal changes can't be made. of course it is not nice to break implementations, so there's still the option of oo2 and friends

none of the sqlite3_xyz() functions have that particular quirk

from what I gathered over the past week or 2 I can confirm this, since most of those are created from the binding signatures, and exceptions (e.g. sqlite3_randomness) are properly documented on the C-style API page

without any support from high-level IDEs/tools

while I admire the effort and dedication, I can't say it is very sustainable. while the decision is not mine to make, I think quite a few of these issues can be mitigated by changing the mindset of what purpose the "official" JS codebase should serve

harkening back to what Daniel (steida) said, I think everything up to xWrap bindings (including Jaccwabyt) is a relatively good marker for "low-level functions", while everything above (oo1, worker1, VFSes) should just be clearly marked as reference implementations, which can be excluded from a build (like the "unsupported" one you suggested, but still released)

(23) By Stephan Beal (stephan) on 2024-01-11 03:50:06 in reply to 22 [link] [source]

not sure about that since there were client-breaking api changes before, so I don't see why breaking internal changes can't be made.

It's not broken now, per se, just awkward in a few places, whereas removing the awkwardness would risk breakage. The policy against gratuitous breakage, and living with our design mistakes, is deeply ingrained in this project.

harkening back to what Daniel (steida) said, I think everything up to xWrap bindings (including Jaccwabyt) is a relatively good marker for "low-level functions", while everything above (oo1, worker1, VFSes) should just be clearly marked as reference implementations, which can be excluded from a build (like the "unsupported" one you suggested, but still released)

That is a perfect description of the intent, and work has started on a build without the "reference implementation" parts, but there's currently no estimate for when it will materialize.

Ideally, none (or very little) of the low-level interface would ever need to be exposed, but a large handful of those functions are often needed at the client level. They're definitely considered low-level, but also part of the public API, so gratuitous breakage isn't an option. Perhaps one day we'll do a new wasm "v2" build in which we can start from scratch and get a cleaner result.

I can't say it is very sustainable.

(Debate about the sustainability of code maintenance with age-old text editors vs. ever-changing high-level tools elided. ;)

I think quite a few of these issues can be mitigated by changing the mindset of what purpose the "official" JS codebase should serve

i will ponder that.

Again, thank you for your candid and thoughtful feedback.

(24) By randName on 2024-01-11 04:54:03 in reply to 23 [link] [source]

debate about the sustainability of code maintenance

if the surface is small there is definitely no issue with tooling-free long-term maintainence (one of my goals with the demo). my warning about sustainability is more about keeping up with more and more JS things being added, than moving off your current environment

age-old text editors vs. ever-changing high-level tools

ultimately I don't think our positions differ too much :). I am still an avid user of [another CLI text editor] and even got typescript autocomplete working at one point, but the technical demands of work and the whole "ever-changing-ness" of it forced me to give that up eventually

I will be the first to admit that it is not a great position to be in, but the intersection of working in a large team and keeping up with the modern web means having to rely on tooling to provide things like type-checking. as a minor respite I can still enjoy using my key-binds with an extension installed

(7) By randName on 2024-01-09 02:58:04 in reply to 4 [link] [source]

i think in this case, it is a large enough change that he prefers to check with the core team

to alleviate this apparent impasse, I would like to check what the team's current development setup is like, since this is what I imagine it to be, from looking at the source:

for example there is a new binding signature to the capi

add it to sqlite3-api-glue.js
wait for emscripten to compile(?)
load the index.html in a browser to check that everything works(?)

i'm also not sure where I'm going with this, but I guess for starters those individual files/functions (e.g. sqlite3-api-glue.js, sqlite3-v-helper.js) could be exposed and distributed (similar to steida's comment) instead of being bound directly to sqlite3ApiBootstrap.initializers, which will allow downstream users to decide if they want to use those functions

(8) By Stephan Beal (stephan) on 2024-01-09 06:52:32 in reply to 7 [link] [source]

Regarding the comments from the impasse ticket on github:

Then I'm not sure how to continue as it seems like the core team is unfamiliar with the current JS landscape (and I'm not saying this is something that is bad or needs to be changed; I certainly understand their concern about it being ever-changing).

You have characterized our position precisely :). We don't live in that ecosystem and have neither the bandwidth, incentive, nor inclination to keep up with its ever-changing tools, workflows, and fads. Because of that, we provide least-common-denominator components and leave the toolchain-specific workings to those who use those toolchains.

I'm not sure how to explain all of that, especially since there seems to be a misunderstanding that it requires node/npm knowledge

Tree-shaking, typings, and processing/validation of JSDoc all require third-party tools, noting that every such tool requires the inseparable node/npm pair.

My largest concern (and why I'm trying to start this conversation in the first place) is that the adoption of this "official" library will probably be affected by the bundle size, and yes since it is open source it can just be forked...

The size concern has been raised a few times the past year but (A) only a few and (B) seems to be largely unwarranted. End users do not seem to be overly concerned about sizes. There are popular websites which push a meg or more just of CSS. The last time i checked, GDrive loaded some 14MB of stuff. Our self-imposed ceiling, suggested to us by a group of full-time third-party web devs at the outset of the wasm effort, has always been 1MB compressed and we're still well under that.

Regarding forking: we would love to see folks create alternatives which suit their environments, philosophies, and tools better. We can't produce a package which is all things to all people and we have never intended to have "the" solution to browser-side sqlite, just one of any potential number of them. It has been my sincere hope to see it either be forked or to provide inspiration for completely separate implementations, but that hasn't (to the best of my knowledge) been the case.

for example there is a new binding signature to the capi...

It's somewhat more involved than that but that's a fair high-level summary.

guess for starters those individual files/functions (e.g. sqlite3-api-glue.js, sqlite3-v-helper.js) could be exposed and distributed

Those individual files are all internal details of the build process and cannot simply be used in isolation. They're split up largely for maintenance reasons and to facilitate reusing them in as-yet-inconceived alternate builds. Their order in the build is significant, as most depend on parts from previous ones, and several require preprocessing depending on their target environment. See ext/wasm/api/README.md for human-readable details and ext/wasm/GNUmakefile for the less-human-readable implementation.

If, this point, you have an itch to explore a fork or alternate build, i'd be more than happy to help bootstrap that provided it wouldn't require my having to go down the node.js rabbit hole. If, on the other hand, you have ideas which move the core in your desired direction and (A) don't require changes to existing APIs (something we avoid like the plague across this whole project) and (B) are 100% free of dev-time node.js, i'm all ears.

PS: please pardon my brevity and lack of external links - this was typed one-handed on a tablet.

(9) By randName on 2024-01-09 07:46:46 in reply to 8 [link] [source]

It has been my sincere hope to see it either be forked or to provide inspiration for completely separate implementations, but that hasn't (to the best of my knowledge) been the case

that considered, I think it would be better to provide one extremely minimal JS output, that only exposes the raw wasm exports without any of the type-converting glue code. You mentioned earlier that that is already available, but there is still a minimal set of options* (e.g. wasi_snapshot_preview1) to provide to the constructor so that the WebAssembly.Instance doesn't crash. Since those are part of the emscripten build process, it would be good to insulate anyone attemting a fork from changes in that part.

I will try to spin up the emcc stuff and figure out what that entails, but ideally the addition is small enough that Thomas feels comfortable including as an alternative export in the npm bit

*most of them can just be no-ops or return 0.

brevity

No worries about that, I've read your self-introduction; hopefully that warning saves a few people from suffering the same fate

(10) By Stephan Beal (stephan) on 2024-01-09 08:30:31 in reply to 9 [link] [source]

You mentioned earlier that that is already available, but there is still a minimal set of options* (e.g. wasi_snapshot_preview1) to provide to the constructor so that the WebAssembly.Instance doesn't crash. Since those are part of the emscripten build process, it would be good to insulate anyone attemting a fork from changes in that part.

That list of imports can change at any time - it's an Emscripten internal detail we have no control over. The list is generated by the compilation step and may change depending on compilation flags. i.e. each pair of compiled wasm and JS files are inseperably linked.

Over the weekend i played with creating a new build which just has the wasm exports and ran into the imports problem. We can't distribute such a build without distributing the Emscripten glue which provides those imports. Such builds are trivial for developers to create themselves - a single emcc invocation.

If we were to take on the distribution of such a build we would also have to take on the tremendous overhead of writing and maintaining a necessarily separate test suite for it which is devoid of the non-trivial type conversion glue. That maintenance belongs with someone who actually needs such an esoteric build.

(again, mobile device)

(11) By randName on 2024-01-09 09:23:51 in reply to 10 [link] [source]

after a bit of tinkering (JS side), I think I managed to find a good compromise - using a Proxy to allow unknown keys to be shimmed. At this point I believe that is sufficient to prevent LinkErrors on instantiation, while allowing runtime errors to be thrown if those unimplemented functions are invoked.

In that case, I feel no need to push so strongly for any code changes to be made upstream. However, I might be back with some writeups for the Cookbook

(12) By Stephan Beal (stephan) on 2024-01-09 09:57:11 in reply to 11 [link] [source]

At this point I believe that is sufficient to prevent LinkErrors on instantiation, while allowing runtime errors to be thrown if those unimplemented functions are invoked

i suspect (without having actually tried it) that you'll end up getting such errors the moment anyone tries to create (for example) a non :memory: db or if the library internally tries to use on-disk storage for temp space. The imports contain a number of APIs which enable Emscripten to transparently proxy the POSIX I/O...

$ wasm-objdump -j import -x sqlite3.wasm
...
Import[35]:
...
 - func[10] sig=2 <env.__syscall_ioctl> <- env.__syscall_ioctl
 - func[11] sig=5 <wasi_snapshot_preview1.fd_write> <- wasi_snapshot_preview1.fd_write
 - func[12] sig=5 <wasi_snapshot_preview1.fd_read> <- wasi_snapshot_preview1.fd_read
<...snip>

If it weren't for that particular API aspect, i'm about 95% certain we could become independent of Emscripten and build using plain old clang.

In that case, I feel no need to push so strongly for any code changes to be made upstream. However, I might be back with some writeups for the Cookbook

That would be wonderful :).

Please do feel free to push for changes, but also please understand that i'm likely to push back if (A) it would require backwards-compatibility breakage¹, (B) it would impose a significant ongoing maintenance burden, or (C) i've not had my coffee yet ;). FWIW, neither the up-front dev effort nor short-term lack of coffee are of as much a consideration as long-term maintenance burden is.

Thank you for your continued feedback!

^{^} This project's long-standing culture precludes most potential backwards-incompatible changes.

(13) By randName on 2024-01-09 11:07:07 in reply to 12 [link] [source]

... you'll end up getting such errors the moment ...

this is correct, but the key is that the error only occurs if that happens; otherwise the functions are mostly unused, as you suspected (they are still accessed, which causes the LinkError when they are absent)

long-term

hence my original goal of making sure that any improvements still survive even if i'm not around to maintain them. kudos to the team for the commitment

(14) By randName on 2024-01-10 04:02:11 in reply to 11 [link] [source]

Have made a small demo with a writeup here: https://github.com/randName/sqlite-wasm-minimal-demo

Do let me know if I should use any specific wording (especially in the background section). I've also used the Unlicence to release it to the public domain, so hopefully it will be easier to integrate into the cookbook

(15) By Stephan Beal (stephan) on 2024-01-10 15:22:17 in reply to 14 [link] [source]

Have made a small demo with a writeup here:

Thank you :).

Regarding the minimal build without all of the JS glue...

It occurred to me today that we can indeed add an "unsupported" build for that in the makefile, so long as we don't offer a download of it (at which point it becomes a testing- and backwards-compatibility liability). That's essentially how WASMFS builds are set up.

That would permit folks who want such a build to run "make minimal" (or whatever) locally to get such a build, then copy it over into their project.

Would that be of any use to you, or would that still a dead end until/unless it's available from npm (noting that npm counts as a downloadable distribution)?

(16) By randName on 2024-01-10 16:44:36 in reply to 15 [link] [source]

add an "unsupported" build for that in the makefile

when you say build do you mean a JS output or a .wasm output? I will assume the goal is to have only one .wasm output so the rest of this reply assumes you are referring to a JS output

that should be helpful for people who want to proceed with writing their own glue-code, but with the assurance that if something goes wrong (which should be rare) they can generate the "unsupported" build and inspect it. But specifically for that purpose I won't know how useful it is until I see what the actual output is like, so please don't feel a need to prioritise this (whatever I have in the demo should be good enough for a while)

as a side-note, personally I don't think it is an issue to setup a local build env occasionally just to generate that JS file (it could even be done with containers or something). I've been spoilt by tools like StackBlitz that manage to get a node engine running in the browser, so I haven't actually setup anything locally other than my day-job codebase

(17) By Stephan Beal (stephan) on 2024-01-10 16:55:04 in reply to 16 [link] [source]

when you say build do you mean a JS output or a .wasm output? I will assume the goal is to have only one .wasm output so the rest of this reply assumes you are referring to a JS output

Strangely enough, we can't build them in isolation - i actually asked the Emscripten devs about that because our build process currently requires re-building the .wasm file multiple times just to get different .js files. Because the resulting .js includes the imports, it is necessarily generated along with the .wasm file.

But yes: the end result is that the .wasm file, even if compiled 3 or 4 times, is identical, with only the resulting .js files being different.

But specifically for that purpose I won't know how useful it is until I see what the actual output is like, ...

It would essentially be the result of:

emcc -o sqlite3.wasm ...

where ... includes some of the required -sXYZ flags and whatnot, but not the --pre-js and --post-js which we currently use to prepend/append our own glue bits.

i will add such a build to our makefile in the coming days and post back when that's done.

Presumably such a build should be ES6, rather than vanilla JS? (That changes the output very slightly.)

(18) By randName on 2024-01-10 17:07:46 in reply to 17 [link] [source]

ES6, rather than vanilla JS

yes, ideally. I'm also not sure what "configurable strings" are being replaced for the bundler-friendly builds, but ideally it should be bundler-friendly too.

(20) By Stephan Beal (stephan) on 2024-01-10 17:18:08 in reply to 18 [link] [source]

I'm also not sure what "configurable strings" are being replaced for the bundler-friendly builds, but ideally it should be bundler-friendly too.

The actual textual differences in the builds are minimal and trivial.

The "bundler-friendly" builds exist because bundlers reportedly require static strings in some places in order to be able to find files. In other builds a dynamically-resolved string can be used. For example:

//#if target=es6-bundler-friendly
    new Worker(new URL("sqlite3-opfs-async-proxy.js", import.meta.url));
//#elif target=es6-module
    new Worker(new URL(options.proxyUri, import.meta.url));
//#else
    new Worker(options.proxyUri);
//#endif

The main difference in ES6 builds, compared to vanilla JS, is that they use import.meta.url in some URL constructor calls, as in the above snippet.

The "bundler-friendly" build flag implies ES6, in any case, and i'll be sure to create the minimal build with that flag enabled. If we need separate "plain ES6" and "bundler-friendly-ES6" builds, that's also an option.

(25.2) By randName on 2024-01-11 15:45:14 edited from 25.1 in reply to 20 [link] [source]

I just realised that these would not matter for any build that doesn't touch the Worker constructor, so the good news is that there probably won't be a need for alternate variants with these low-level builds

Edit:

I just got a containerised build setup working (based off the emsdk Docker container), but have not actually run the .wasm output to try yet. there were some awk: not an option: -e warnings but it doesn't look like critical stuff is affected.

So the final output JS is pretty much the stuff I threw away in the initial stages of my experimenting (the Module object and the POSIX FS stuff), which actually means that any DIY loader for an emscripten project should serve that purpose well. Unfortunately it is not something I would be using, but I must say I have learnt a lot through this so nothing has been wasted

Edit 2:

On second thought, the build will at least expose the mangled names (for a non -g3 build), so it might come in handy

(26) By Stephan Beal (stephan) on 2024-01-11 15:59:14 in reply to 25.1 [link] [source]

there were some awk: not an option: -e warnings but it doesn't look like critical stuff is affected.

That particular invocation probably isn't critical (just injecting version info), but why -e isn't supported is a mystery. i'll look into that.

Unfortunately it is not something I would be using, but I must say I have learnt a lot through this so nothing has been wasted

Welcome to my world ;).

Related...

i've been looking into a build minus the "reference implementation" stuff and it arguably doesn't save enough download size to be worth the effort. We need to include OPFS because OPFS is literally the whole reason this subproject was started. People have been using sqlite in the browser for 10+ years without persistence via sql.js, and can continue to do so, but OPFS is the "killer feature" introduced in the past 18 months¹. Because OPFS "has" to be included, the only parts which we can reasonably strip are oo1, worker1, and promiser. The grand-total savings is only something like 40kb of JS after comments/docs are stripped. That's a tiny fraction of a second of download and parsing time, which hardly seems worth any effort to squeeze it out of the build. Stripping out the two OPFS VFSes would save another 55k, but that seems rather pointless.

That said...

Producing a "bare metal" build (for lack of a better term) which includes only the core API's wasm exports, with no glue code, is still on my list/underway as time and energy permit. That's all happening off-trunk until after the 3.45 release.

^{^} i'm kinda partial to the localStorage/sessionStorage-hosted dbs myself, but they're extremely limited in size.

(27.1) By randName on 2024-01-11 17:13:46 edited from 27.0 in reply to 26 [link] [source]

OPFS is the "killer feature"

strongly agree, even if the http-vfs was what caught my attention even earlier

at this point I've already dipped my hand into building the wasm from source and it was relatively painless with the emsdk container, so I'll probably be spending the next few days throwing more and more -DSQLITE_OMIT_*s at emcc.

I also realise it might be possible to swap out the emscripten FS functions with calls to the OPFS (with the relevant workarounds for SyncAccessHandle, or JSPI), which will save on installing the VFS, so keeping some of that around might actually be good