WASM: Error getting sync handle for xWrite

(1) By Ajit (ajitk0) on 2024-06-07 07:52:20 [source]

Hello, first of all thank you so much for creating sqlite and the wasm version for the web. I have been using it in a Chrome extension to persist data using OPFS with sqlite-wasm.

Some of our users have reported the following error:

OPFS asyncer: xwirite(): GetSyncHandleError: Error getting sync handle for xWrite(). 6 attempts failed. distill.sqlite-journal : NotFoundError: A requested file or directory could not be found at the time an operation was processed. sqlite3-opfs-async-proxy.js:85
  at getSyncHandle (sqlite3-opfs-async-proxy.js:313:19)
  at async Object.xWrite [as f] (sqlite3-opfs-async-proxy.is:690:18)
  at async f (sqlite3-opfs-async-proxy.js:854:19) {fid: 664208, filenameAbs: 'distill.sqlite-journal', filenamePart: 'distill.sqlite-journal', dirHandle: FileSystemDirectoryHandle, fileHandle: FileSystemFileHandle, ... }

Error updating table: {type: 'error', dbId: 'db#1@660656', messageId: 'exec#18490', workerReceivedTime: 52955564.5, workerRespondTime: 52960112.29999995, ... }

SQLite version is:

SQLITE_VERSION "3.45.1"
SQLITE_VERSION_NUMBER 3045001
SQLITE_SOURCE_ID "2024-01-30 16:01:20 e876e51a0ed5c5b3126f52e532044363a014bc594cfefa87ffb5b82257cc467a"

The SQLite db is opened and queried using the worker api:

workerPromiser('open', {
  filename: 'file:distill.sqlite?vfs=opfs'
});

I have tried to reproduce the error by making multiple concurrent insert and update requests but couldn't so far.

Any idea what could be wrong with some of these environments?

(2) By Stephan Beal (stephan) on 2024-06-07 11:02:42 in reply to 1 [link] [source]

GetSyncHandleError: Error getting sync handle for xWrite(). 6 attempts failed.

That means that that the file handle is already opened and unavailable to us. OPFS currently exclusively locks a file when it is opened. The message you're seeing is the VFS's attempt to work around that by waiting for access, but it can't wait for long before giving up.

I have tried to reproduce the error by making multiple concurrent insert and update requests but couldn't so far.

How many workers can reliably access the same OPFS db file is highly dependent on both the environment and the workload. That particular VFS can typically handle a small handful of concurrent connections on a desktop-grade machine if each one uses the databases in very small transactions. As soon as any given worker keeps a transaction open longer than the VFS's wait time, it will start to fail in the way you're experiencing.

I have been using it in a Chrome extension to persist data using OPFS with sqlite-wasm.

PS: we have no insight about the implications of running this API in a browser extension, as they're browser-specific and we very willfully avoid browser-specific corners of JS.

db#1@660656

And for what it's worth, i very strongly recommend against using the Worker1 API for anything more than toy demo applications. That API has very serious limitations which the other approaches to using the library do not, as described in their docs.

On the surface, that API is easy to use and convenient, but its limitations very quickly get in the way of any apps more complex than toys.

(3) By Ajit (ajitk0) on 2024-06-07 16:45:41 in reply to 2 [link] [source]

Thank you for the response and suggestions.

GetSyncHandleError: Error getting sync handle for xWrite(). 6 attempts failed.

That means that that the file handle is already opened and unavailable to us. OPFS currently exclusively locks a file when it is opened.

Thanks for the note. Is `NotFoundError error expected when the file has already been opened and not available? I expected it be be NoModificationAllowedError when an already locked file is being accessed.

How many workers can reliably access the same OPFS db file is highly dependent on both the environment and the workload.

I looked into possibility of multiple workers trying to access the same DB. The Chrome extension initializes SQLite worker using an offscreen document. The offscreen document creates only one worker as soon as it loads.

Chrome ensures (as per their docs) that a maximum of one instance of offscreen document is active at any point of time. Using extension API to create more than one document threw error. Is it possible that an improperly terminated worker can lead to such a state?

PS: we have no insight about the implications of running this API in a browser extension, as they're browser-specific and we very willfully avoid browser-specific corners of JS.

Sounds fair because of ever growing API surface area. I saw others using sqlite-wasm in extensions and chose it based on initial success. Using it in an extension is very useful for offering a fast experience to users.

i very strongly recommend against using the Worker1 API for anything more than toy demo applications.

Thanks for the recommendation. What would you recommend for using a non-blocking API, creating our own worker using sqlite3 for more control?

(4) By Roy Hashimoto (rhashimoto) on 2024-06-07 20:16:52 in reply to 2 [link] [source]

I think this might be a bug. Note that the error is apparently on the journal file and not the database file, and that error is NotFoundError (as OP notes). So this is probably a call to xOpen() with SQLITE_OPEN_MAINJOURNAL and SQLITE_OPEN_READONLY.

I don't see a way for your VFS to return SQLITE_CANT_OPEN in the case where the open fails. This can happen if xAccess() says that the journal file exists but the file has been deleted by the time xOpen() is called.

(5) By Stephan Beal (stephan) on 2024-06-08 10:18:44 in reply to 3 [link] [source]

Is `NotFoundError error expected when the file has already been opened and not available? I expected it be be NoModificationAllowedError when an already locked file is being accessed.

i admittedly overlooked the "NotFound" part because the code which throws that exception is waiting on a lock to free up, not waiting on the file to appear.

The offscreen document creates only one worker as soon as it loads.

If you do not need concurrency, i'd recommend the other OPFS VFS, as it performs much better:

wasm:/doc/trunk/persistence.md#vfs-opfs-sahpool

What would you recommend for using a non-blocking API, creating our own worker using sqlite3 for more control?

My recommendation is to load the sqlite library into your thread (whichever thread that is) and access it directly, as opposed to via a proxy running in another thread. The only time, AFAIK, that's not possible is if your application is running in the main thread and you need OPFS (which cannot run in the main thread). That's a niche the worker1 fills, but its limitations make it more appealing (to me) to refactor the app to run in a worker thread.

From Roy:

I don't see a way for your VFS to return SQLITE_CANT_OPEN in the case where the open fails. This can happen if xAccess() says that the journal file exists but the file has been deleted by the time xOpen() is called.

My current suspicion is a race condition in the wait-for-unlock bit, which i'll investigate. That the case of "file deleted while we wait for a lock" has never (AFAIR) been reported before seems somewhat odd in retrospect.

(6) By Ajit (ajitk0) on 2024-06-17 12:47:18 in reply to 5 [link] [source]

We have moved to opfs-sahpool vfs. The performance is noticeably better. Thanks!

Is this vfs also likely to reduce chances of encountering the bug?

That the case of "file deleted while we wait for a lock" has never (AFAIR) been reported before seems somewhat odd in retrospect.

We started receiving reports only after wider deployment of sqlite-wasm.There are couple of reports in the following thread: https://forums.distill.io/t/chrome-extension-watchlist-not-loading/3464/10?u=ajitk.

(7) By Stephan Beal (stephan) on 2024-06-17 13:12:38 in reply to 6 [link] [source]

We have moved to opfs-sahpool vfs. The performance is noticeably better. Thanks!

Just be aware that it's a trade-off. For a browser extension concurrent use isn't an issue, so the sahpool VFS is a hands-down winner performance-wise (3-4x faster in most of my non-trivial tests, less so with tiny workloads). Much (but not all) of the performance bottleneck in the other OPFS VFS is related to trying to shoehorn concurrency on top of a filesystem which doesn't really support it. Upcoming planned changes should reduce that performance gap measurably, but won't do away with it entirely.

That the case of "file deleted while we wait for a lock" has never (AFAIR) been reported before seems somewhat odd in retrospect.

That particular problem was resolved last week (for a given definition of "resolved"). In such cases it should now properly report that the file is not there, as opposed to reporting a generic I/O error, which will (depending on the context) give the library core more leeway in how to handle it. That fix is queued for the 3.47 and any potential 3.46.x patch releases.

Thank you for your report and feedback!

(8) By Ajit (ajitk0) on 2024-06-19 10:25:29 in reply to 7 [link] [source]

That particular problem was resolved last week (for a given definition of "resolved"). ... give the library core more leeway in how to handle it.

Will the library now try to handle SQLITE_NOTFOUND error instead of raising the original error?

That fix is queued for the 3.47 and any potential 3.46.x patch releases.

When are they expected to be released? I am assuming that there are other fixes going to be added to the queue. Do you recommend that we build the branch-3.46 to get early access to the fix until then?

Thanks!

(9) By Stephan Beal (stephan) on 2024-06-19 10:59:34 in reply to 8 [link] [source]

Will the library now try to handle SQLITE_NOTFOUND error instead of raising the original error?

It depends on the context, actually. That error can hypothetically be thrown from just about any API call in the VFS where it needs to have a lock on the file. The core library, if my memory serves correctly, will handle your particular case if it happens via xOpen() (when initially opening a file). If that case shows up via any other API call it likely indicates severe API misuse, e.g. somehow deleting the db from one worker while using it in another.

When are they expected to be released?

There is no current release date for 3.47 and there are no current plans for a 3.46 patch release. Patch releases are always created on an as-needed basis, depending largely on the number and severity of fixes.

I am assuming that there are other fixes going to be added to the queue.

For 3.46, any checkins added to branch-3.46 since its initial release are implicitly queued for the next patch release. Likewise for the trunk and the next non-patch release.

Do you recommend that we build the branch-3.46 to get early access to the fix until then?

We always encourage folks to work from the trunk version rather than waiting on releases. The prelease snapshot of the WASM bits has just been updated to the trunk version, in case that will help you.

(10) By Ajit (ajitk0) on 2024-06-20 01:53:19 in reply to 9 [link] [source]

Thanks, this helps :)