Technical Gotchas: Potential Tripping-points for Clients

See also: WASM-JS peculiarities

Gotchas:

In programming, a gotcha is a valid construct in a system, program or programming language that works as documented but is counter-intuitive and almost invites mistakes because it is both easy to invoke and unexpected or unreasonable in its outcome.

Source: Wikipedia

Though every effort is made to shield users from unfortunate gotchas, some cannot be avoided...

Relative URI Resolution when Loading sqlite3.js from a Worker

Two problems compound to create a significant gotcha for loading sqlite3 via a Worker:

  1. Resolution of relative URIs differs depending how the script is loaded.
  2. It is impossible, from JS, to determine the currently-loading script's URI when it is loaded via importScripts().

This becomes a problem when sqlite3.js resides in a directory other than the one in which the client application lives and the client wants to load it using a Worker. A perfectly sensible usage pattern is something like:

However, that will fail because sqlite3.js will not be able to find sqlite3.wasm (and related files) because of how relative URIs are resolved. The workaround for that is to tell the library where it lives when instantiating the Worker with a URI argument:

new Worker('foo.js?sqlite3.dir=path/to');

Then, from foo.js:

importScripts('path/to/sqlite3.js');

When sqlite3.js is loaded via importScripts(), the only URL the JS environment exposes to it is the one from which the containing Worker is loaded, which leads to sqlite3.js being unable to resolve sqlite3.wasm.

As a workaround, sqlite3.js inspects the URL arguments for the one URL it can see (the one passed to the Worker's constructor). If it finds sqlite3.dir, it will attempt to load other sqlite3-related files from that directory. If it does not find that, it will do its best to figure out the correct path, falling back to the current directory (which only works if it is in the same directory as the client application).

Unfortunately, the sqlite3.dir path must be duplicated in the URI to foo.js and the importScripts() call, and eliminating that duplication in the latter requires a great deal more code than that duplication. For example:

let sqlite3Js = 'sqlite3.js';
const urlParams = new URL(self.location.href).searchParams;
if(urlParams.has('sqlite3.dir')){
  sqlite3Js = urlParams.get('sqlite3.dir') + '/' + sqlite3Js;
}
importScripts(sqlite3Js);

Also unfortunately, URL arguments passed along with importScripts() arguments are simply ignored, as the URI provided to importScripts() is not available to the script being loaded that way. Its current URL will resolve to the Worker script which loads it (in the above example, foo.js), so the sqlite3.dir URL argument needs to be applied when loading that script.

Note that the above is not an issue when loading sqlite3.js, or one of its supplemental JS files, via a <script> tag, as an ES6 module, or directly via the Worker constructor:

const w = new Worker('path/to/sqlite3-worker1.js');

Will do the right thing because it has enough state to figure out which directory it needs to load sqlite3.js and sqlite3.wasm from.

For more details about how relative URIs are resolved in different contexts, see:

https://zzz.buzz/2017/03/14/relative-uris-in-web-development/

WASM Heap Corruption is Easy!

WASM's view of memory is a simple flat byte array. With only a small handful of exceptions, that view is completely devoid of data type safety. Unlike C compilers, which offer compile-time warnings when attempting to apply data type X to memory which has been declared as type Y, WASM's view of the memory is completely devoid of type information.

What does this mean? It means that it's absolutely trivial to corrupt the WASM heap without intending to do so:

sqlite3.wasm.poke( 42, 0x1234, 'i32' );

We've just overwritten 4 bytes of the WASM heap, at address 42, with the value 0x1234. That might or might not result in misbehavior at some indeterminate point later on. Such corruption, just like heap corruption in C, will have entirely unpredictable effects with entirely unpredictible timing. Unlike C, however, we do not have heap-analysis tools like the indispensible valgrind to help us in WASM.

To be clear: heap corruption in WASM is limited to the memory inside the WASM environment's sandbox. It is impossible, barring serious bugs in the host WASM engine, to corrupt memory outside of the WASM environment from within the WASM environment.

In short, if a JS application starts throwing completely inexplicable errors, such as throwing an exception here:

myDb.exec("SELECT 1");

claiming that there's an SQL syntax error, the culrprit is undoubtedly memory corruption. (Yes, that particular symptom of memory corruption has in fact happened before. Another symptom seen more than once is an exception from WASM claiming that a called function has an invalid signature.)

Unfortunately, there is no good formula for tracking down such corruption, and it might not even show up until a month later. The best one can do, in terms of finding the cause, is to backrev to a version of the app which does not exhibit the problem, then "bisect" (in the SCM sense of the term), or step one version at a time, until a version is found which exhibits the problem, and then look for differences which might account for it. Anything which writes to the WASM heap is a potential suspect.

Good luck!