SQLite User Forum

Support for blocking VFS
Login

Support for blocking VFS

(1) By David Matson (dmatson) on 2021-07-07 22:31:49 [link] [source]

We're using SQLite in a component with significant performance and parallelism requirements, and we would like to rely on OS locking rather than retry loops to handle multiple readers/writers. Stated another way, we'd like to find a way to ensure that we never see SQLITE_BUSY returned to the application and instead wait on an OS handle whenever such a condition would occur.

From looking at the source, it appears that the win32 VFS uses OS primitives that would support blocking rather than failing with SQLITE_BUSY when there is contention, but SQLite currently only uses these primitives in non-blocking mode. For example:

/*
** Currently, SQLite never calls the LockFileEx function without wanting the
** call to fail immediately if the lock cannot be obtained.
*/
#ifndef SQLITE_LOCKFILEEX_FLAGS
# define SQLITE_LOCKFILEEX_FLAGS (LOCKFILE_FAIL_IMMEDIATELY)
#endif

I was able to implementing something similar with a wrapper VFS for journal_mode = DELETE, by following the source for the current win32 VFS and replacing the non-blocking calls with blocking calls (I can share the code if that context would clarify things on this thread). However, for WAL mode, some calls to ShmLock appear to be "probes" that require the call to fail immediately if the lock isn't available, including cases with just one thread where the lock is guaranteed not to be available. For non-WAL mode, it looks like a correct implementation can block/wait to acquire the lock, but for WAL mode, the contract appears to prohibit waiting.

Two main questions:

  1. For non-WAL-mode, how feasible would it be to support a blocking mode directly in SQLite? (rather than using a custom VFS, as seems possible today)
  2. For WAL mode, would it be feasible to extend the VFS layer to make it possible to implement such a blocking version via a custom VFS?

Thanks,

David

(2) By anonymous on 2021-07-07 23:22:08 in reply to 1 [link] [source]

Did you see the blocking support for WAL mode databases?

https://www.sqlite.org/src/file?name=doc/wal-lock.md&ci=204dbc15a682125c

(3.1) By David Matson (dmatson) on 2021-07-09 17:58:08 edited from 3.0 in reply to 2 [link] [source]

I had seen some related things in the source. It looks like that is POSIX-only - is that correct? I'm specifically looking for Windows support.

(4) By HowardKapustein (howardk) on 2021-12-07 04:58:55 in reply to 3.1 [source]

Any news on this? It would be very useful for many scenarios

(5) By HowardKapustein (howardk) on 2022-01-06 06:55:10 in reply to 4 [link] [source]

Post-holiday bump

(6) By Richard Hipp (drh) on 2022-01-06 15:25:22 in reply to 5 [link] [source]

We are kinda overwhelmed dealing with other more pressing issues at the moment. Sorry for the delay.

(7) By HowardKapustein (howardk) on 2024-12-06 06:50:04 in reply to 6 [link] [source]

Circling back on this. Any update since 2022?

(investigating an issue and stumbled over this open thread which might be directly helpful with my current headache)

(8.3) By Nuno Cruces (ncruces) on 2024-12-06 11:32:49 edited from 8.2 in reply to 7 [link] [source]

Do you want “block forever” or with a timeout? With a timeout only Apple supports F_SETLKWTIMEOUT, which I guess is why the SQLITE_ENABLE_SETLK_TIMEOUT option hasn't gained a lot of popularity.

The implementation simply loops testing the lock with F_SETLK and sleeping in millisecond increments.

I guess Windows might support it, but my attempts in my Go driver were not very successful. I ended up looping and sleeping in millisecond intervals too.

PS: If you have ideas that effectively allow blocking in millisecond increments for either Linux or Windows, I'd be very interested as I'm using #define SQLITE_ENABLE_SETLK_TIMEOUT 2.

(9) By Dan Kennedy (dan) on 2024-12-06 15:14:17 in reply to 7 [link] [source]

There is experimental support for SQLITE_ENABLE_SETLK_TIMEOUT on a branch here:

https://sqlite.org/src/timeline?r=win32-enable-setlk

Dan.

(10) By HowardKapustein (howardk) on 2024-12-08 07:02:41 in reply to 9 [link] [source]

Do you want “block forever” or with a timeout?

Timeout. I think I can hack in block-forever but the ramifications when things go sideways would be ugly (even if tiny, non-zero and throw enough dice at the wall...)

I'm less interested in block-instead-of-BUSY than in lock acquisition fairness. At least if N threads were waiting in LockFileEx() I have some confidence Windows would apply some degree of fairness to avoid (or at least minimize) starvation. Right now every call call to LockFile is a unique DoYouFeelLuckyPunk shot in the dark. On a system under heavy load the dice are rolled enough I'm seeing some snake eyes :-(

If I'm reading the amalgamated sources right SQLite uses LockFileEx on Windows (on "NT", not Win9x or WinCE. I'm interested in Win7+ so yeah, 'NT') but always with an immediate return

/*
** Currently, SQLite never calls the LockFileEx function without wanting the
** call to fail immediately if the lock cannot be obtained.
*/
#ifndef SQLITE_LOCKFILEEX_FLAGS
# define SQLITE_LOCKFILEEX_FLAGS (LOCKFILE_FAIL_IMMEDIATELY)
#endif

LockFileEx supports timeout and infinite/block-forever but SQLite doesn't take advantage of them. Per LockFileEx docs

The LockFileEx function operates asynchronously if the file handle was opened for asynchronous I/O, unless the LOCKFILE_FAIL_IMMEDIATELY flag is specified. ... If the file handle was not opened for asynchronous I/O and the lock is not available, this call waits until the lock is granted or an error occurs, unless the LOCKFILE_FAIL_IMMEDIATELY flag is specified

winOpen doesn't seem to open the file for asynchronous I/O so telling the compiler /DSQLITE_LOCKFILEEX_FLAGS=LOCKFILE_EXCLUSIVE_LOCK should produce block-forever behavior. I may try that but I'm uncomfortable relying on it (cure-worse-than-the-disease...).

I have my own derivative VFS (win32-longpath plus drop-impersonation / restore in xOpen and few like functions) so I could implement my own xLock instead of delegating to win32-longpath's, but that requires CreateFile is called with FILE_FLAG_OVERLAPPED and (I think) affects all i/o with that handle. I expect a lot more changes would be needed -- xRead, xWrite and more including shared memory handling handled in sqlite3.c and not delegated to sqlite3_io_methods. Or at least a much more complicated puzzle to sift through.

So I'm much more interested to hear if the resident SQLite experts have made any progress on this front since it was mentioned years back.

(11) By Nuno Cruces (ncruces) on 2024-12-08 10:02:31 in reply to 10 [link] [source]

You got your threads crossed. Dan Kennedy just did, see the post you replied to.

I tried this approach before, and my recollection is that it didn't work very well for small the blocking increments that I use #define SQLITE_ENABLE_SETLK_TIMEOUT 2.

But it probably works fine with #define SQLITE_ENABLE_SETLK_TIMEOUT 1.

I will try again, though.

(12) By HowardKapustein (howardk) on 2024-12-10 02:24:17 in reply to 9 [link] [source]

Neat

BTW line 2555

** If parameter pMutex is not NULL, then

Drifted off at the keyboard in the middle of writing the comment? :P

(13) By Nuno Cruces (ncruces) on 2024-12-11 22:56:38 in reply to 9 [link] [source]

Thank you Dan. Ported it to my VFS. It works!

(14) By Nuno Cruces (ncruces) on 2025-06-05 09:39:15 in reply to 9 [link] [source]

Hi Dan! I see this got merged to main for 3.50, as well as sqlite3_setlk_timeout.

I've been struggling to enable this in my custom VFS, and have so far failed.

I do #define SQLITE_ENABLE_SETLK_TIMEOUT 2 which triggers single millisecond timeouts (that help blocking locks coexist with busy handlers) but Windows will randomly enter into a situation where it can't acquire locks with such short timeouts, and tests that would otherwise succeed under heave concurrency become flaky and fail to make progress.

I was really hopeful when I read your comment that you'd found my issue (that waiting indefinitely for LOCKFILE_FAIL_IMMEDIATELY would fix this, despite documentation saying otherwise), but alas no such luck.

I'm now convinced that WaitForSingleObject is basically incompatible with short timeouts. If I raise the minimum timeout to Window's timer resolution (16ms) the issue goes away. So, just a heads up for you and others, that this newly checked in code might cause issues with #define SQLITE_ENABLE_SETLK_TIMEOUT 2 (this only happens - infrequently - with a dozen concurrent connections hammering a WAL database).

I can report that blocking locks, and specifically #define SQLITE_ENABLE_SETLK_TIMEOUT 2, do work fine with macOS's F_SETLKWTIMEOUT.

Final question, is using SQLITE_SETLK_BLOCK_ON_CONNECT supposed to block indefinitely, or does it respect the ms? Looking at the source code, it seemed to be an indefinite wait.

Thanks!

(15) By Dan Kennedy (dan) on 2025-06-05 11:27:03 in reply to 14 [link] [source]

I do #define SQLITE_ENABLE_SETLK_TIMEOUT 2 which triggers single millisecond timeouts (that help blocking locks coexist with busy handlers) but Windows will randomly enter into a situation where it can't acquire locks with such short timeouts, and tests that would otherwise succeed under heave concurrency become flaky and fail to make progress.

We fixed a problem with SQLITE_ENABLE_SETLK_TIMEOUT=2 on windows just a few days ago here. See the changes to os_win.c around line 2727 in the following:

https://sqlite.org/src/info/7f9c0cdd0630a41d

With SETLK_TIMEOUT=2, the VFS is supposed to block for 1ms then return SQLITE_BUSY if it cannot obtain the lock. Then SQLite calls its busy-handler and attempts the lock again as normal. The idea is that this may help boost the OS priority of the process holding the lock. But os_win.c was actually returning SQLITE_BUSY_TIMEOUT in this case, which meant SQLite was returning the error to the caller immediately instead of invoking the busy-handler and retrying.

Final question, is using SQLITE_SETLK_BLOCK_ON_CONNECT supposed to block indefinitely, or does it respect the ms? Looking at the source code, it seemed to be an indefinite wait.

It's supposed to wait indefinitely. We can't use a timeout for that lock because it's on the database file, and we don't want to open the database file for overlapped IO, which is required for a blocking lock with a timeout.

Dan.

(16.1) By Nuno Cruces (ncruces) on 2025-06-06 09:08:52 edited from 16.0 in reply to 15 [link] [source]

Thank you for your answer Dan.

From my experience, it still seems to me that, at scale, doing LockEx, then WaitForSingeObject(1ms), then CancelIo, then try again, can't be made to work.

It kinda works, but under stress it randomly degenerates into no one making progress. My stress test involves 5000 connections, 8 at a time, 9 readers per writer, short transactions, from multiple processes (so all locks are on the filesystem, no in-memory locks).

To be clear, this is not with your os_win.c VFS, and can definitely be my own fault/bug.

But it works flawlessly (as in, always makes progress, and the test runs to completion every time with thousands of runs) with timeouts of 15ms, and starts failing increasingly more frequently as I reduce the timeouts down from 5ms to 1ms. It also works fine with polling locks (no timeouts, no blocking locks).

(17) By Dan Kennedy (dan) on 2025-06-06 11:09:33 in reply to 16.1 [link] [source]

Sounds like we should just remove the ENABLE_SETLK_TIMEOUT=2 support from os_win.c. If it's not helpful when the system is under stress, what's the point of it?

Dan.

(18) By Nuno Cruces (ncruces) on 2025-06-06 12:50:12 in reply to 17 [link] [source]

Right. I don't really know the answer. Whether this is a problem may also depend on the busy handler you use.

My busy handler checks to see if "waiting/sleeping" has already happened, and only sleeps if it hasn't.

So it really becomes a "busy" loop of:

LockEx()
WaitForSingeObject(1ms)
CancelIo()

Maybe things eventually make progress if the loop is:

LockEx()
WaitForSingeObject(1ms)
CancelIo()
Sleep(1 … 250ms) // this is the default busy handler

I don't know if it'd make sense to have a minimum of 15ms for WaitForSingeObject, and if that'd make sense with ENABLE_SETLK_TIMEOUT=2. The advantage of ENABLE_SETLK_TIMEOUT=2 IMO, is that it allows the app to periodically cancel the wait with a custom busy handler, without completely degenerating into simple polling.