WASM opfs & multiple tabs

(1) By mlaw (tantaman) on 2022-11-09 20:39:05 [source]

The docs state that the OPFS VFS should be usable from multiple tabs so long as a transaction is not open and locking the file.

I'm running into a case where I'm making a single write (create table if not exists ...) and trying to open the db in two tabs.

The first tab succeeds, the second tab throws:

sqlite3-opfs-async-proxy.js:63 OPFS asyncer: Error getting sync handle. Waiting 900 ms and trying again. p2pwdb-todo-example-journal DOMException: Access Handles cannot be created if there is another open Access Handle or Writable stream associated with the same file.
sqlite3-opfs-async-proxy.js:63 OPFS asyncer: xRead() failed GetSyncHandleError: Error getting sync handle. 4 attempts failed. p2pwdb-todo-example-journal : Original exception [NoModificationAllowedError]: Access Handles cannot be
sqlite3.js:12084 OPFS syncer: xRead() async error: GetSyncHandleError: Error getting sync handle. 4 attempts failed. p2pwdb-todo-example-journal : Original exception [NoModificationAllowedError]: Access Handles cannot be created if there is another open Access Handle or Writable stream associated with the same file.
comlink.ts:265 Uncaught (in promise) SQLite3Error: sqlite result code 266: disk I/O error
    at toss3 (sqlite3.js:9531:35)
    at checkSqlite3Rc (sqlite3.js:9569:7)
    at opfsUtil.OpfsDb.ctor (sqlite3.js:9681:9)
    at new opfsUtil.OpfsDb (sqlite3.js:13111:37)
    at SQLite3.open (wrapper.ts:24:21)
    at Object.open (comlinkable.ts:58:25)
    at callback (comlink.ts:312:36)

(2) By Stephan Beal (stephan) on 2022-11-09 23:24:04 in reply to 1 [link] [source]

The docs state that the OPFS VFS should be usable from multiple tabs so long as a transaction is not open and locking the file.

Technically speaking, it's so long as an sqlite3_vfs::xLock() is not active. Exactly when that happens is above my proverbial pay grade, though :/.

The first tab succeeds, the second tab throws:

Locking in OPFS is going to be painful, no doubt. We've done what we can within the limited capabilities of OPFS, but their locking APIs are still evolving and may offer new solutions at some point.

Our current approach is to acquire a writable handle when sqlite3_vfs::xLock() is called and release it (keeping a read-only handle) when sqlite3_vfs::xUnlock() is called. If acquiring a lock handle fails, it will retry 4 times, with an increasing wait interval between each, before throwing the exception that you pasted in is propagated.

That's literally as fine-grained as we can currently do for OPFS, so i'm not sure that we have a solution to your problem.

(12) By anonymous on 2022-11-17 16:24:00 in reply to 2 [link] [source]

Locking in OPFS is going to be painful, no doubt. We've done what we can within the limited capabilities of OPFS, but their locking APIs are still evolving and may offer new solutions at some point.

I think you should try using the Web Locks API to implement SQLite locking. I know you're averse to that, but:

Web Locks are supported everywhere OPFS is (and where OPFS is expected to arrive).
You can test for Web Locks support and fall back to something else if you don't trust that will always be the case.
You can wait asynchronously on a Web Lock, so no polling (and no having to tune your polling for different browsers and machines), which could address the problem described in this thread.

(13) By Stephan Beal (stephan) on 2022-11-17 17:08:00 in reply to 12 [link] [source]

Web Locks are supported everywhere OPFS is (and where OPFS is expected to arrive).

Definitely a fair argument!

You can wait asynchronously on a Web Lock, so no polling (and no having to tune your polling for different browsers and machines), which could address the problem described in this thread.

We have to poll anyway because the only way we currently have to fake synchronous access to the async OPFS API (recall that sqlite's API is 100% synchronous) is to use Atomics.wait() and Atomics.notify(). Since we're polling constantly, anyway, we just interrupt the polling every half a second or so to give up any implicit locks we acquired. We can, if needed, drop that time to 100ms - pending real-world experience will guide us.

My point in that is only that the polling option is "free" for us - we're just piggybacking on it to release any implicit locks. WebLocks would require more development effort and may or may not perform better. That's not an option i'm ruling out, by any means, especially after considering your first argument. It's not at the very top of the TODO list, though. Before doing so i'll need to come up with a test app which runs in multiple tabs at once, or simulates it with multiple workers, so that we can evaluate the overall effectiveness of the current approach and WebLocks meaningfully.

Thank you for the feedback! We are especially eager for it in these early days so that we can improve the API before imposing backwards-compatibility constraints in a future release.

(14) By anonymous on 2022-11-18 16:03:06 in reply to 13 [link] [source]

We have to poll anyway because the only way we currently have to fake synchronous access to the async OPFS API (recall that sqlite's API is 100% synchronous) is to use Atomics.wait() and Atomics.notify(). Since we're polling constantly, anyway, we just interrupt the polling every half a second or so to give up any implicit locks we acquired.

There actually appear to be two places where you "poll". One is the one you describe at waitLoop() (I don't consider this technically polling but that's just semantics) and the other is in getSyncHandle(). They both add latency when there is contention by sleeping while work is pending - getSyncHandle() for database files and waitLoop() for journal files - and additionally getSyncHandle() can cause starvation. I think it is incorrect to say that the existence of one makes the other one "free" - the latencies will overlap but not perfectly - and I think you can avoid both of these latencies and the starvation (and the need for tuning the magic numbers for the intervals).

For database files, I believe that implementing xLock/xUnlock using Web Locks as suggested earlier should remove both latency and starvation.

For journal files, I would try releasing the OPFS access handle in the xUnlock call for its database file. This could give up some performance over many individually locked small transactions where the current code could reuse the access handle (or it might be negligible), but users can recover this optimization by collecting transactions in a PRAGMA locking_mode=EXCLUSIVE block.

(15) By Stephan Beal (stephan) on 2022-11-18 17:09:36 in reply to 14 [link] [source]

(I don't consider this technically polling but that's just semantics) and the other is in getSyncHandle(). They both add latency when there is contention by sleeping while work is pending - getSyncHandle() for database files and waitLoop() for journal files - and additionally getSyncHandle() can cause starvation

getSyncHandle() isn't poling - it's just a proxy for an async OPFS call which we have to "await" on. i've argued several times with the Google folks that an async API for OPFS is misled because async APIs exist to account for high latency and OPFS has no latency, but that's a battle i've long since lost. They are not interested having a 100% synchronous API for OPFS. (It would, however, immediately gain us some 30-40% in performance, as that's the amount of time the OPFS proxy spends waiting on cross-thread communication barriers.)

Obtaining a sync handle is the only way to lock an OPFS file, and every one of our APIs which uses it needs that handle to for the underlying OPFS-level operation. In order to alleviate contention, we also have to release that handle in a timely manner. There's no way around that.

For database files, I believe that implementing xLock/xUnlock using Web Locks as suggested earlier should remove both latency and starvation.

Unfortunately, that doesn't solve the problem. That was our first attempt, but there are sqlite internals which do not call xLock/xUnlock, which is why the other APIs have to use getSyncHandle() at all.

For journal files, I would try releasing the OPFS access handle in the xUnlock call for its database file.

When sqlite's internals call xLock and xUnlock, the OPFS layer unconditionally releases the lock in xUnlock. When the internal's don't call xLock/xUnlock, the OPFS proxy has no choice but to either hold on to the lock forever or release it after some delay¹ The first option, holding the lock forever, has proven to cause 100% contention between tabs. We're currently experimenting with the second approach, releasing "implicit locks" after a brief delay. IMO it "should" resolve most contention, at least for 2-3 tabs (and we'll never claim that 5+ tabs to the same db is a sane design approach), but only experience will tell us whether that's really the case.

We've yet to hear any feedback that the current approach doesn't help. i'm not currently convinced that WebLocks gain us any concurrency at all because we would still have to lock and unlock in exactly the same places, and exactly the same conditions, as we do now. i'm not going to commit to trying WebLocks until/unless there's a truly compelling reason to do so (e.g. we have strong contention issues between as few as 2-3 tabs).

That's not to rule out WebLocks entirely, but Real Life(TM) priorities force me to be choosier than i'd normally be about what experiments i can commit time to, and experimenting WebLocks is currently way down the list of priorities. Until i can get an app in place which simulates concurrency issues, we'd have no way of meaningfully comparing WebLocks and the current approach, and that app, alas, is also not near the top of the current list of priorities.

^{^} If we release "implicit locks" during the operation which obtains them, performance drops by a factor of 4, so that's not a realistic option. In order to keep performance sane, we have to hold on to those locks for the span of several operations.

(16) By anonymous on 2022-11-18 17:36:25 in reply to 15 [link] [source]

In order to alleviate contention, we also have to release that handle in a timely manner. There's no way around that.

This is true. I'm trying to show you there might be an easy way to do this without a timeout.

Unfortunately, that doesn't solve the problem. That was our first attempt, but there are sqlite internals which do not call xLock/xUnlock, which is why the other APIs have to use getSyncHandle() at all.

[...]

When sqlite's internals call xLock and xUnlock, the OPFS layer unconditionally releases the lock in xUnlock. When the internal's don't call xLock/xUnlock, the OPFS proxy has no choice but to either hold on to the lock forever or release it after some delay.

I think you're missing a key detail in my post. Yes, SQLite doesn't call xLock/xUnlock on journal files, and journal files can be opened by multiple connections (when journal_mode is TRUNCATE or PERSIST). The suggested change is to release the access handle on the journal file when its database file is unlocked. Then acquiring the access handle on a journal file will always succeed (except in exotic conditions, like issues in the underlying platform file system).

(17) By Stephan Beal (stephan) on 2022-11-18 23:53:24 in reply to 16 [link] [source]

I think you're missing a key detail in my post

That is most certainly possible - i am a newbie in the subtleties of the vfs API.

The suggested change is to release the access handle on the journal file when its database file is unlocked.

To the best of my admittedly limited knowledge, the vfs does not have enough information to know that unless, perhaps, it guesses based on file names, which feels like an ugly hack. Until we have proven that the current approach is insufficient, such hacks don't seem justified. i will keep it in mind, however.

(18) By Stephan Beal (stephan) on 2022-11-21 05:47:58 in reply to 12 [link] [source]

I think you should try using the Web Locks API to implement SQLite locking.

After creating a test app with which to tinker with OPFS concurrency...

The truly atrocious lack of concurrency has been improved significantly but is still light years away from desktop-app-grade concurrency. Whether it can be further improved with the current OPFS API is an open question. The highly unpredictable timing of cross-thread messaging is a factor, as is OPFS's inability to unambiguously distinguish locking errors from any other errors. There may well be other factors as yet undiscovered.
Web Locks work along the lines of acquireALock(callThisCallbackWithinThatLock), which is inappropriate for sqlite's purposes. MDN demonstrates an alternative usage which takes a promise and holds the lock until the promise settles, but it's not yet clear whether we would be able to make use of that technique to simulate the workflow we need.

Experimentation will continue as time and energy allow, but high-grade concurrency in a browser environment is currently looking to be a pain point. Three workers/tabs performing small updates every second or so is (mostly) fine, but once in a while a timing quirk will cause one of them to fail spuriously. Since we're unable to unambiguously distinguish OPFS lock errors from other errors, we're limited in how we can deal with those internally beyond assuming that every failed attempt to grab a sync access handle is a locking error and retrying a set number of times. If that fails, sqlite will typically (depending on context) propagate such failures as I/O errors.

(3) By Stephan Beal (stephan) on 2022-11-10 11:43:26 in reply to 1 [link] [source]

I'm running into a case where I'm making a single write (create table if not exists ...) and trying to open the db in two tabs.

Just FYI, i suspect that i know what's causing this but adding a workaround for it is costing us a 400% performance hit in our benchmarks. If you are able, please replace your copy of sqlite3-opfs-async-proxy.js with the one from the opfs-lock-without-xlock branch and report whether that alleviates the contention. If it doesn't, this is a dead end approach. If it does, there might be a way to reduce that 4x performance hit significantly and experimentation can continue along those lines.

(4) By mlaw (tantaman) on 2022-11-10 13:02:06 in reply to 3 [link] [source]

it "works" although every write retries several times before finally succeeding. Writes take several seconds to complete 😅

Example: https://www.loom.com/share/a64c1e53e94843469df89276d2c61364

(5) By Stephan Beal (stephan) on 2022-11-10 13:27:53 in reply to 4 [link] [source]

it "works" although every write retries several times before finally succeeding. Writes take several seconds to complete 😅

Thank you for testing! While you were doing so i modified the locking strategy a bit to eliminate the huge performance hit reported in my previous post and merged that into trunk. Would you mind trying with the the latest trunk, please? You'll just need to replace that one file again. Several others were modified in the corresponding checkins, but nothing which would affect your test.

It would not surprise me if writes now take roughly 0.5-1 second to complete, but it is also possible that you may still be seeing a several-second delay.

(6.1) By mlaw (tantaman) on 2022-11-10 20:08:25 edited from 6.0 in reply to 5 [link] [source]

Haven't tried your latest changes yet but another thought did occur to me.

If SQLite could work in a SharedWorker then we could use that to share the SQLite instance across tabs.

Unfortunately SharedArrayBuffer seems to be unavailable in SharedWorkers.

(7) By Stephan Beal (stephan) on 2022-11-11 11:54:44 in reply to 6.1 [link] [source]

Unfortunately SharedArrayBuffer seems to be unavailable in SharedWorkers.

We looked into SharedWorker early on (middle of this year) but it was not yet in all three of the Big Three Browsers so it didn't even get a second look. The outlier browser just added support for it in v16 not quite two months ago, so we'll certainly return to that topic at some point, but not in the immediate future.

We make every reasonable effort to be portable to browsers from the past two-ish years, the baseline being Sept. 2020, when the 3rd browser vendor added BigInt support (something we need for APIs which use int64). OPFS support is an exception in that regard but it was the possibility of using OPFS which got this whole project started.

(8) By mlaw (tantaman) on 2022-11-11 12:23:31 in reply to 5 [link] [source]

Roy mentioned that wa-sqlite doesn't have contention issues with OPFS. Haven't dug into his code yet to understand how that can be the case but might be worth a look.

(9) By Stephan Beal (stephan) on 2022-11-11 12:45:07 in reply to 8 [link] [source]

Haven't dug into his code yet to understand how that can be the case but might be worth a look.

i will do so as time allows. Setting up a multi-worker contention test like he mentions just got bumped up my TODO list, as that's much easier to create and test than multi-tab tests and is hypothetically equivalent in terms of locking.

That said: having 5+ tabs all using the same OPFS-hosted db feels like it falls into the category of telling your doctor "it hurts when I do this," to which he can only respond "then don't do that."

(10) By mlaw (tantaman) on 2022-11-11 13:20:59 in reply to 9 [link] [source]

That said: having 5+ tabs all using the same OPFS-hosted db feels like it falls into the category of telling your doctor "it hurts when I do this," to which he can only respond "then don't do that."

I agree with you but you never know what your end user is going to do. They have the power to make tabs and point all those tabs at the same site so someone will do just that and expect it to work.

Although that does make a different workaround occur to me:

Listen for when the tab is no longer active and disconnect from the DB.

It has drawbacks though. I can imagine a use case where in one tab the user is editing data and in the other tab they're previewing the changes caused by those edits.

(11) By Stephan Beal (stephan) on 2022-11-11 13:43:18 in reply to 10 [link] [source]

Listen for when the tab is no longer active and disconnect from the DB.

That unfortunately only works from the main UI thread, whereas OPFS is worker-only:

https://developer.mozilla.org/en-US/docs/Web/API/Document/visibilitychange_event

In order to make use of that we'd have to load components in the main thread, even if sqlite is otherwise loaded in a worker or sub-worker (and in the latter case there would be no direct communication path for the event).

The changes made yesterday "should" increase the concurrency considerably over your initial tests but are still not a magic bullet. As Roy pointed out in one of his posts, the journal does not get xLock()'d. Certain vfs APIs, like xWrite(), require a sync access handle (i.e. a lock), and if they're called without an intervening call to xLock(), they now (since yesterday) acquire what's internally called an auto-lock. All such automatic/implicit locks are relinquished as soon as the db driver goes idle: once it stops receiving API calls for 500ms (that time is subject/likely to change if future testing deems it desirable)¹. All explicit locks (xLock() calls) are held until their corresponding xUnlock() call is received.

Before yesterday's change, such automatic locks were also acquired but never released until the file was closed or an explicit xLock/xUnlock() pair was made on the file handle. The fact that not all API calls are bound by an xLock()/xUnlock() pair wasn't clear to me until a few days ago.

^{^} src:/info?name=24d1c1982a012d99&ln=754-766

(19) By scafaria on 2023-03-22 23:41:34 in reply to 11 [link] [source]

Hi, I have a Chrome extension using SQLite with OPFS and found this post while googling my error: "sqlite Access Handles cannot be created if there is another open Access Handle or Writable stream associated with the same file." I see that this post was last updated several months ago. May I ask the "latest and greatest" guidance to avoid the lock? Thanks. Full error:

OPFS syncer: xRead() async error: GetSyncHandleError: Error getting sync handle. 4 attempts failed. /mydb.sqlite3 : Original exception [NoModificationAllowedError]: Access Handles cannot be created if there is another open Access Handle or Writable stream associated with the same file.

(20) By Stephan Beal (stephan) on 2023-03-22 23:52:56 in reply to 19 [link] [source]

May I ask the "latest and greatest" guidance to avoid the lock?

Everything we know on that topic is detailed at wasm:/doc/trunk/persistence.md#vfs-locking.

(21) By scafaria on 2023-03-23 21:42:44 in reply to 20 [link] [source]

thank you