xCheckReservedLock doc and possible WebAssembly bug

(1) By Roy Hashimoto (rhashimoto) on 2024-06-08 17:25:48 [source]

The documentation for VFS file methods says:

The xCheckReservedLock() method checks whether any database connection, either in this process or in some other process, is holding a RESERVED, PENDING, or EXCLUSIVE lock on the file. It returns true if such a lock exists and false otherwise.

This is not correct. The method should return SQLITE_OK or some error code like SQLITE_IOERR_CHECKRESERVEDLOCK. The existence (or not) of a reserved lock is reported by setting the integer pointed to by the pResOut argument.

The WebAssembly VFS implementation of xCheckReservedLock() looks wrong to me. I think this method will always say that a reserved lock exists because xCheckReservedLock() is only called when the connection holds a shared lock, and this means that an actual hot journal, e.g. left behind by a crash, will never be detected.

Is this VFS tested for hot journal recovery?

(2) By Stephan Beal (stephan) on 2024-06-08 22:20:47 in reply to 1 [link] [source]

I think this method will always say that a reserved lock exists

My recollection is that we intentionally did that a tick shy of 2 years ago because opfs only supported an exclusive lock, and then never felt a need to revisit it. That might, in hindsight, be in conflict with later code which was added to squeeze some concurrency out of it.

Is this VFS tested for hot journal recovery?

It's not explicitly tested for that, a shortcoming i take full responsibility for.

... and this means that an actual hot journal, e.g. left behind by a crash, will never be detected.

i will take a closer look at this in the coming week and either justify what's there or (more likely!) make appropriate changes.

(3) By Roy Hashimoto (rhashimoto) on 2024-06-09 00:31:08 in reply to 2 [link] [source]

My recollection is that we intentionally did that a tick shy of 2 years ago because opfs only supported an exclusive lock, and then never felt a need to revisit it.

My reading is that the current code shouldn't work at all, as it is in practice the same as always reporting true. I think that because there is an implicit exclusive lock, it would be more correct to always report false.

It might be fully correct to always report false with this VFS. The only reason I hesitate to claim that is the mention of this corner case which I don't fully understand...but I speculate won't happen if xUnlock() always succeeds.

(4) By Nuno Cruces (ncruces) on 2024-06-09 09:56:49 in reply to 1 [link] [source]

If this is wrong, that's not the only VFS that is wrong.

See (e.g.) dotlockCheckReservedLock and flockCheckReservedLock in os_unix.c.

(5.1) By Roy Hashimoto (rhashimoto) on 2024-06-09 13:59:19 edited from 5.0 in reply to 4 [link] [source]

Thanks for looking into that! That certainly puts a lot of doubt on my reading. I just don't understand how that passes this test in hasHotJournal() since the caller holds a shared lock.

(6) By Nuno Cruces (ncruces) on 2024-06-09 15:41:50 in reply to 5.1 [link] [source]

I agree: I don't understand either.

Maybe these VFSes aren't tested for hot journal rollback either?

(7) By Roy Hashimoto (rhashimoto) on 2024-06-09 23:02:23 in reply to 6 [link] [source]

Here's a way to create a hot journal on Linux with the 3.46.0 CLI:

$ sqlite3 foo.db
SQLite version 3.46.0 2024-05-23 13:25:27
Enter ".help" for usage hints.
sqlite> PRAGMA cache_size=0;
CREATE TABLE IF NOT EXISTS t(x PRIMARY KEY, y);
INSERT OR REPLACE INTO t VALUES ('foo', 1), ('bar', 2), ('baz', 3);

-- Incomplete write transaction.
BEGIN IMMEDIATE;
WITH RECURSIVE cnt(x) AS
    (SELECT 1 UNION ALL SELECT x+1 FROM cnt LIMIT 10000)
    INSERT OR REPLACE INTO t SELECT x, random() FROM cnt;
sqlite> ^Z
[1]+  Stopped                 /home/roy_hashimoto/sqlite-src-3460000/sqlite3 foo.db
$ kill -9 %1
[1]+  Killed                  /home/roy_hashimoto/sqlite-src-3460000/sqlite3 foo.db
$ ls -al
total 308
drwxrwxrwt 1 root          root            4096 Jun  9 22:34 .
drwxr-xr-x 1 root          root            4096 Jun  9 20:58 ..
-rw-r--r-- 1 roy_hashimoto roy_hashimoto 299008 Jun  9 22:34 foo.db
-rw-r--r-- 1 roy_hashimoto roy_hashimoto  13824 Jun  9 22:34 foo.db-journal
$

If you do the above and then open the database again with the default VFS, it looks like this:

$ sqlite3 foo.db
SQLite version 3.46.0 2024-05-23 13:25:27
Enter ".help" for usage hints.
sqlite> SELECT * FROM t;
foo|1
bar|2
baz|3
sqlite> 

$ ls -al
total 28
drwxrwxrwt 1 root          root           4096 Jun  9 22:37 .
drwxr-xr-x 1 root          root           4096 Jun  9 20:58 ..
-rw-r--r-- 1 roy_hashimoto roy_hashimoto 12288 Jun  9 22:37 foo.db
$

This looks fine - the database file has been restored to the original size, and the journal file is gone.

Set up the hot journal again and then reopen with the unix-dotfile VFS:

$ sqlite3 -vfs unix-dotfile foo.db
SQLite version 3.46.0 2024-05-23 13:25:27
Enter ".help" for usage hints.
sqlite> SELECT * FROM t;
foo|1
bar|2
baz|3
sqlite> PRAGMA page_size;
4096
sqlite> PRAGMA page_count;
3
sqlite> 

$ ls -al
total 312
drwxrwxrwt 1 root          root            4096 Jun  9 22:41 .
drwxr-xr-x 1 root          root            4096 Jun  9 20:58 ..
-rw-r--r-- 1 roy_hashimoto roy_hashimoto 299008 Jun  9 22:40 foo.db
-rw-r--r-- 1 roy_hashimoto roy_hashimoto  13824 Jun  9 22:40 foo.db-journal
$

SQLite doesn't report that the database is malformed, but the state doesn't look right either. The database file size has not been restored and so it is inconsistent with the page_count, and the journal file is still present.

My guess is that the journal wasn't considered hot and so was not played back. It's strange that the database is apparently still considered valid, though. Is it just some quirk of this VFS?

(8) By Nuno Cruces (ncruces) on 2024-06-10 21:34:52 in reply to 7 [link] [source]

I'm guessing a better test would be to use mptest.c on these VFSes. I'll try to do that if I have bandwidth for it.

(9) By Stephan Beal (stephan) on 2024-06-11 12:15:25 in reply to 4 [link] [source]

If this is wrong, that's not the only VFS that is wrong.

Just FYI, we're looking into this behavior for both the unix-dotfile and OPFS VFSes. Thank you both (Nuno and Roy) for the analysis and details.

Would there be any benefit to changing the OPFS VFS to use a dotfile locking system? @Nuno/Roy: your opinions on such a change would be much appreciated.

PS: my activity on this will regretfully be slower than i'd like because of current project-external influences but this topic is #1 on my personal sqlite priority list.

(10) By Roy Hashimoto (rhashimoto) on 2024-06-11 13:30:20 in reply to 9 [link] [source]

Would there be any benefit to changing the OPFS VFS to use a dotfile locking system?

No, I think dotfile gives you all the same problems on another file, and adds a new problem of removing obsolete dotfiles.

I think this VFS currently works on Chrome 108, Firefox 111, and Safari 17. You should use the Web Locks API.

Note that you also have access to BroadcastChannel at these versions so for extra credit you could do lazy unlocking to close the OPFS access handle only when another connection broadcasts a request. This should be more performant than your current approach with an idle timer.

(15.1) By Nuno Cruces (ncruces) on 2024-06-12 15:41:13 edited from 15.0 in reply to 9 [link] [source]

I can't in good faith comment on browser usage.

The fix seems to make sense for unix-dotfile (which is very useful for WASI usage!)

I have to say I find it highly suspicious if unix-flock is OK. My reading of flockCheckReservedLock is that it removes the file lock if it is ever called.

We never enter this branch, because flockCheckReservedLock is always called under a SHARED_LOCK:

  /* Check if a thread in this process holds such a lock */
  if( pFile->eFileLock>SHARED_LOCK ){
    reserved = 1;
  }

So we always go through this (reserved is always false, acquiring the lock always succeeds because we already had it, and thus we release it, and tell no one about it):

  /* Otherwise see if some other process holds it. */
  if( !reserved ){
    /* attempt to get the lock */
    int lrc = robust_flock(pFile->h, LOCK_EX | LOCK_NB);
    if( !lrc ){
      /* got the lock, unlock it */
      lrc = robust_flock(pFile->h, LOCK_UN);
      if ( lrc ) {
        ...
      }
    } else {
      ...
    }
  }

That's... pretty bad. Unless I'm mistaken, obviously!

(11.1) By Stephan Beal (stephan) on 2024-06-12 13:12:49 edited from 11.0 in reply to 1 [link] [source]

The documentation for VFS file methods says: ... It returns true if such a lock exists and false otherwise. ... This is not correct.

FYI, those docs have been updated to clarify that the "return" in question is via its output pointer argument. We may well have other docs which are similarly lax with the word "return" - please feel free to call any out which you come across.

Status updates in/around that API:

A change to the OPFS xCheckReservedLock() is currently being tested, always "returning" 0, as you suggest, and, incidentally, which lines up closely with the unix-dotfile VFS fix for 99b12eb1af870f0a which is also currently under testing.
The apparent race condition of an OPFS file being deleted while we await a lock is still an open issue at the top of my todo list. As you say in 0ad96e25537d, WebLocks would definitely be a better way to go but we have known clients on Android Chrome and that browser apparently just got WebLocks not quite a month ago (which is frustrating because caniuse.com says that 94.6%+ of the world supports it). That does not rule out WebLocks for a future VFS, in any case (on the contrary, it provides some motivation to get that next one written).

Again: thank you for your continued feedback!

(12) By Roy Hashimoto (rhashimoto) on 2024-06-12 13:28:29 in reply to 11.1 [link] [source]

WebLocks would definitely be a better way to go but we have known clients on Android Chrome and that browser apparently just got WebLocks not quite a month ago

If you're basing this on the caniuse table then I can see why you might think that, but that is incorrect. Chrome on Android got Web Locks at version 69, along with desktop Chrome. The definitive source is chromestatus.com, that's also what MDN says, and I've been using it on my phone for years.

(13) By Stephan Beal (stephan) on 2024-06-12 13:30:36 in reply to 12 [link] [source]

If you're basing this on the caniuse table then I can see why you might think that, but that is incorrect.

Indeed i was. That info changes things, thank you!

(14) By Roy Hashimoto (rhashimoto) on 2024-06-12 14:18:46 in reply to 11.1 [link] [source]

The [apparent race condition of an OPFS file being deleted while we await a lock] is still an open issue at the top of my todo list. As you say in [0ad96e25537d], WebLocks would definitely be a better way to go

I would really like to see you use Web Locks, but I don't think locking is the problem for this other bug. I think what happens is xAccess() detects a journal file, then SQLite tries to xOpen() it (to check if it is a hot journal) but it has already been deleted and the VFS internally throws an error (NotFoundError). So far, so good AFAICT.

The VFS needs to return SQLITE_CANTOPEN here to tell SQLite that is what happened (so it can handle it here). Instead I believe the VFS is returning a generic error code that SQLite can't proceed with. As I see it, the problem is just in the translation of the internal exception to the proper return code.

(16) By Stephan Beal (stephan) on 2024-06-12 15:12:22 in reply to 14 [link] [source]

I would really like to see you use Web Locks

As of a few hours ago, it's my intent to be ported to WebLocks by the time 3.47 comes around (with the usual caveats about Real Life™ interfering).

Instead I believe the VFS is returning a generic error code that SQLite can't proceed with. As I see it, the problem is just in the translation of the internal exception to the proper return code.

That shortcoming will be targeted for repair during the port to WebLocks.

The "hard part" now is wrapping my highly-synchronous-shaped head around how WebLocks slot into place here, but i just happen to know that there's an existing WebLocks-using OPFS VFS over on github which will provide a great study point for me.

Again, thank you for feedback on this.

(17) By Roy Hashimoto (rhashimoto) on 2024-06-12 15:25:53 in reply to 11.1 [link] [source]

A change to the OPFS xCheckReservedLock() is currently being tested

Based on your commit message:

This does not impact any current tests, and may have no direct impact at all because of how that VFS handles locking, but is hypothetically a more correct solution than the previous one.

I don't think you understand the severity of this bug. The impact of the bug is that hot journals cannot be played back in the event of a crash.

(18) By Stephan Beal (stephan) on 2024-06-12 15:53:04 in reply to 17 [link] [source]

The impact of the bug is that hot journals cannot be played back in the event of a crash.

i believe i understand that bit, but its severity is, i naively opine, at least partially mitigated by its (apparent) rarity.

Given that the plan is move to WebLocks before 3.47, and that very few people use the non-npm builds (and the npm builds are always based on a formal release), i'd rather fix that as part of the WebLocks port than produce a more immediate patch which a handful of people might run but which would soon be replaced.

That said: i recognize that your understanding of the wider implications exceeds mine and i'm wide open to suggestions.

(19) By Stephan Beal (stephan) on 2024-06-12 21:59:30 in reply to 18 [link] [source]

The impact of the bug is that hot journals cannot be played back in the event of a crash.

For completeness's sake, a follow-up after an enlightening off-list discussion:

The corruption potential introduced by the OPFS VFS's buggy xCheckReservedLock() implementation is now fixed on trunk and the 3.46.x branch.
The "file is deleted while waiting on a lock" case is a separate issue. It will be patched in the very near future to communicate a more precise result code from the VFS back to the core library (SQLITE_CANTOPEN instead of a generic SQLITE_IOERR).