"database is locked": detecting deadlock vs. busy timeout?

(1) By anonymous on 2023-06-21 15:09:19 [link] [source]

I'm looking at a logfile showing a delete failed with SQL_BUSY. The db is in WAL mode with a 100 ms busy timeout. This table is accessed by two threads. Only one thread writes to this table, and that's the thread that logged the error.

After reading a bunch of stuff, I ran across this forum post:

https://sqlite.org/forum/forumpost/e51c0e031ceb51dc

Is there a way (extended error code, perhaps?) to tell if the call returned immediately to prevent a deadlock, or if the call timed out?

I think that in my case the read only thread couldn't have created a deadlock and that maybe the delete would have worked with a longer timeout.

Thanks in advance for any helpful comments.

(2) By Keith Medcalf (kmedcalf) on 2023-06-21 16:01:41 in reply to 1 [source]

By "thread" do you mean connection? Please use the word "connection" where you mean connection. You cannot access anything at all in the database without a connection, and you apparently have none.

That is to say a useful description might be something like:

I have two threads. Each thread has its own connection to the same database file which is in WAL journaling mode. One of the connections writes to a table in the database via its connection. It logged an SQL_BUSY when trying to delete a record.

(4) By anonymous on 2023-06-21 16:16:32 in reply to 2 [link] [source]

Sorry for the confusion.

Yes, there are two threads, each with a connection to the same database file.

The writing thread started a transaction with "BEGIN TRANSACTION" which did not return an error. Then it tried "DELETE FROM tablename" and that's when the 'database is locked' error was logged.

The table is roughly 20 rows x 50 columns. It's read once/second by the reading thread, and written once/hour or so by the writing thread.

We've never seen this before, and the code is running 12 hours/day at multiple locations for the last couple of years, so I'm concerned I'm looking at very intermittent bug in my code.

(5) By cj (sqlitening) on 2023-06-21 17:29:21 in reply to 4 [link] [source]

I get SQL_BUSY using "Begin" or "Begin Immediate".
I have never received a SQL_BUSY using "Begin Exclusive" using SQLitening.

(6) By Keith Medcalf (kmedcalf) on 2023-06-21 17:39:13 in reply to 4 [link] [source]

The writing thread started a transaction with "BEGIN TRANSACTION" which did not return an error.

BEGIN TRANSACTION does not start a transaction. It merely turns off the autocommit machinery. You say "database is locked" was logged. Which "database is locked" is that? Status code 5 or status code 6?

https://sqlite.org/lockingv3.html#transaction_control

Since you are issuing un unconstrained delete, is you application cache at least twice the size of your database?

It is also possible that you are trying to delete that which is in use by the other connection.

(7) By anonymous on 2023-06-21 17:58:32 in reply to 6 [link] [source]

Which "database is locked" is that? Status code 5 or status code 6?

Sadly, my code doesn't log the actual error value, just the output from sqlite3_errmsg(), which is "database is locked". sqlite3_errstr() returns "database is locked" for 5, and "database table is locked" for 6, which suggests my logged error is status code 5.

It is also possible that you are trying to delete that which is in use by the other connection.

If that's the case, then increasing the busy timeout value might help, right? Since the read thread connection isn't trying to write to the table, it shouldn't be a potential deadlock?

Since you are issuing un unconstrained delete, is you application cache at least twice the size of your database?

Not sure. I just ran PRAGMA cache_size from the CLI on the db and it returned the default of -2000. The on-disk size of the db file is roughly 200 kB.

(9) By Keith Medcalf (kmedcalf) on 2023-06-21 18:15:02 in reply to 7 [link] [source]

Are you certain that the database is in WAL journal-mode?

(10) By anonymous on 2023-06-21 18:22:55 in reply to 9 [link] [source]

Are you certain that the database is in WAL journal-mode?

It should be obvious by the sketchy details of my bug that I'm certain of nothing at this point!

The code executes a "PRAGMA journal_mode=WAL;" at startup, and that call does not return with an error.

Looking at the filesystem, I see a .db file, a .db-wal file, and a .db-shm file, all with recent timestamps.

Opening the live database file with the CLI tool and running "pragma journal_mode;" returns "wal".

(11) By Keith Medcalf (kmedcalf) on 2023-06-21 18:58:50 in reply to 10 [link] [source]

The code executes a "PRAGMA journal_mode=WAL;" at startup, and that call does not return with an error.

So you do not bother retrieving the singleton result which is the journal mode?

Opening the live database file with the CLI tool and running "pragma journal_mode;" returns "wal".

Ok. So you have something else that already has a write in progress against the database file on some other connection somewhere and you are being told that SQLite is BUSY deqaling with that at the current time, please try your write operation later, after the current lock holder has reelinquised its lock.

(12) By anonymous on 2023-06-21 19:31:03 in reply to 11 [link] [source]

So you do not bother retrieving the singleton result which is the journal mode?

No. Perhaps in the future I'll add that to the log.

So you have something else that already has a write in progress against the database file on some other connection somewhere

I don't see how that's possible, unless sqlite is writing to the table without my knowledge. The read thread doesn't insert or delete or update, only select.

I truly appreciate all the help in this thread. It's been educational. I'll mod the code to use a busy handler callback so that I can discriminate between timeouts and deadlocks and improve how I retry failed transactions.

(13) By Keith Medcalf (kmedcalf) on 2023-06-21 19:59:51 in reply to 12 [link] [source]

If you are sure there are only two connections to the database file and that one of them should never write, then issue pragma query_only=1 on that connection. If it attempts to write it will have an error.

(14) By cj (sqlitening) on 2023-06-21 20:10:18 in reply to 13 [link] [source]

Here are some portions of SQLitening handling of %SQLITE3_BUSY which might be useful.
It would be nice if you only need to increase "sqlite3_busy_timeout" ?
Curious why "begin exclusive" doesn't immediately invoke %SQLITE_BUSY for other connections.
I'm done.

DECLARE SUB  UsingSetBusyTimeout CDECL(BYVAL rhDab AS DWORD, BYVAL rlWaitMS AS LONG) 

'Open 

irGetRutAddress "sqlite3_busy_timeout", lhRutAddr
CALL DWORD lhRutAddr USING UsingSetBusyTimeout(thDab, llA)

'Exe statements
DO
  CALL DWORD thCallExe USING UsingExe(thDab, STRPTR(rsStatement), 0, IIF&(INSTR(rsModChars, "R"), 1, 0), 0) TO tlLastError
  IF tlLastError AND ISFALSE irCheckIfBusy("Exe " & rsStatement) THEN GOTO ErrorRut
LOOP WHILE tlLastError = %SQLITE3_BUSY  

FUNCTION irCheckIfBusy(rsStatement AS STRING) AS LONG
'   Returns %True if was busy and should try again
   IF tlLastError <> %SQLITE3_BUSY OR ISTRUE tuDbFlags.Bits.DoNotRetryIfBusy THEN EXIT FUNCTION
   IF MSGBOX(LEFT$(rsStatement, 255) & STRING$(2, $CR) & "The database is currently locked.  Want to try again?", %MB_ICONQUESTION OR %MB_YESNO OR %MB_SYSTEMMODAL, $Title) = %IDNO THEN EXIT FUNCTION

   FUNCTION = %True
END FUNCTION

(15) By Keith Medcalf (kmedcalf) on 2023-06-21 21:01:06 in reply to 14 [link] [source]

Curious why "begin exclusive" doesn't immediately invoke %SQLITE_BUSY for other connections.

Firstly, because the other connections need to try an actually do something. If there is no request to do anything, there is is no way to signal SQLITE_BUSY.

Secondly, you should read the documentation. https://sqlite.org/lang_transaction.html

In particular the following:

EXCLUSIVE is similar to IMMEDIATE in that a write transaction is started immediately. EXCLUSIVE and IMMEDIATE are the same in WAL mode, but in other journaling modes, EXCLUSIVE prevents other database connections from reading the database while the transaction is underway.

(17) By anonymous on 2024-06-12 10:22:23 in reply to 15 [link] [source]

This documentation doesn't make sense to me, would you be able to clarify it?

As far as I understand, if the database is not in WAL mode you can't have readers concurrent with a writer, so whether EXCLUSIVE or IMMEDIATE other database connections would not be able to read the database would they?

So I'd have thought wal mode would be the only one where exclusive/immediate made a difference, but the documentation says the opposite. What gives? What is missing from my understanding?

(18.1) By Nuno Cruces (ncruces) on 2024-06-12 17:50:29 edited from 18.0 in reply to 17 [link] [source]

An IMMEDIATE writer locks out other writers, and starts writing to it's own private page cache.

Only once it's either (1) ready to commit, or (2) writes so many pages that the cache needs to spill to disk, will it try to acquire the EXCLUSIVE lock, locking out readers.

In WAL mode EXCLUSIVE acts like IMMEDIATE (it locks out writers, not readers and writers). I WAL mode, a transaction “never” blocks readers.

(19) By anonymous on 2024-06-13 07:20:57 in reply to 18.1 [link] [source]

OK you don't actually explain what BEGIN EXCLUSIVE does but from what you're saying I'm assuming it means instead of only acquiring a RESERVED lock, BEGIN EXCLUSIVE will immediately acquire the EXCLUSIVE lock?

I think the bit I was missing is I had no idea RESERVED still allowed concurrent readers, and that this is what a writer in non-wal mode would acquire until commit or cache spill.

And so IMMEDIATE and EXCLUSIVE have the same behaviour in WAL because... there's no EXCLUSIVE lock in WAL mode, so that's not a state which can be transitioned to.

Does that sound right?

(20) By Stephan Beal (stephan) on 2024-06-13 08:57:37 in reply to 19 [link] [source]

I'm assuming it means ...

There's no need to assume, as the locking behavior and the varieties of BEGIN are documented at www:/lockingv3.html and www:/lang_transaction.html. If you find shortcomings in those docs, please point out what's not clear and we can look at updating them.

(21) By anonymous on 2024-06-13 09:19:55 in reply to 20 [link] [source]

To me the issue was (/ is if assumptions are undesirable) that the transaction documentation does not fully explain the behaviours and does not link either transaction mode to the corresponding locks.

So I think you would need to have a good grasp of the details of sqlite locking to intuit where the divergence lays, something I did not have: my rough understanding of non-wal mode was that reads and writes were fully exclusive, as such I was under the incorrect impression that a database would be fully locked from the first write, not just for commit / spill.

Thus the relatively vague hints at the difference between IMMEDIATE and EXCLUSIVE seemed incoherent: "prevents other database connections from reading the database while the transaction is underway" was my model of execution for all non-wal writes.

I think it would be helpful to expand the documentation for the transactions to indicate explicitly that:

IMMEDIATE acquires a RESERVED lock (?) which blocks other writers but not readers until spill / commit
EXCLUSIVE acquires an EXCLUSIVE lock (?), blocking both readers and writers entirely until completion (or abort)

They should also probably link to the locking modes document (and even the locking modes themselves): currently the transaction control document is a relatively high level document which I'd expect most users to read as soon as they need transactions while the locking docs feels more like low level details most important for very precise work.

(22.2) By Nuno Cruces (ncruces) on 2024-06-13 10:20:34 edited from 22.1 in reply to 21 [link] [source]

The thing is that locking is an implementation detail.

BEGIN DEFERRED starts a transaction and ensures consistent reads. BEGIN IMMEDIATE starts a write transaction, and ensures it can conclude (barring I/O errors): it won't be prevented from writing by a concurrent connection invalidating its reads. BEGIN EXCLUSIVE starts a write transaction and prevents BEGIN DEFERRED readers from getting into a situation where they can't become writers.

I've explained their semantics with no mention of file locking, which is up to the VFS.

You can have a VFS that always locks exclusively (there are a few in the SQLite tree). You can have a VFS that never locks at all and simply corrupts files. Or one that is more optimistic about concurrency and locks somewhat less. I have a VFS for which a every lock except SHARED is the same (so BEGIN IMMEDIATE is the same as BEGIN EXCLUSIVE).

All of these influence how much concurrency you get from a database, but none should break transaction semantics (barring IO errors).

(23) By anonymous on 2024-06-13 11:47:20 in reply to 22.2 [link] [source]

The thing is that locking is an implementation detail.

Sure.

Point is to actually explains the (minimum) guarantees and tradeoffs of the different transaction modes in ways the current documentation does not. Your explanation paragraph does do so, it's great.

(3) By anonymous on 2023-06-21 16:02:06 in reply to 1 [link] [source]

You need to install your own busy handler callback so you can tell whether or not sleeping was attempted.

(8) By anonymous on 2023-06-21 18:00:45 in reply to 3 [link] [source]

Thanks, this is very helpful! I've got some hacked up test code that can detect the difference between a forced deadlock and busy wait.

(16) By Donal Fellows (dkfellows) on 2023-06-22 08:59:44 in reply to 1 [link] [source]

In the case where you have a reader that only reads and a writer that only writes (i.e., no transaction upgrading; reading within the write is fine) then a timeout is entirely sufficient provided that all write transactions are kept as short as possible. Assuming a sensible timeout length, of course (I'd guess something like 10 times the average write transaction time would be enough). SQLite is not optimized for write-heavy loads from more than one connection; if you're doing that, you don't want more than a single writer (and might disable syncs for most of the time; that's a totally different part of optimisation).

The cause of actual deadlocks is when you have two (or more) connections that both want to upgrade their current shared (read) transactions to exclusive (write) transactions at the same time, as each will wait indefinitely for the other to release its shared transaction; I think there's a timeout but it isn't one you want to hit as it is really mystifying when you hit it. This is why upgrading transactions is so dangerous. The workaround I've found is to only ever do activity that may write within the scope of an exclusive transaction and never to allow them to occur without having explicitly started a transaction (rules that I enforce in my wrapper around SQLite); with them in place I don't see any deadlocks, even when I'm hitting the DB pretty hard with a parallel workload with many small writes (a parallelized test suite I can't meaningfully share because of other non-trivial dependencies). Avoiding this requires understanding the detail of your queries in any code that may write to the DB.