Stress testing shows reading after a commit not always works

(1) By inetic on 2023-03-03 16:52:32 [link] [source]

In advance I would like to apologize that the test is written in Rust using the sqlx wrapper. If that is an issue I could try to reduce it further to C/C++ but by now I'm much less familiar with the C API so it may take a while.

The code in question is here.

In pseudo code, it does this:

1. Create two databases A and B, both with one connection for reading and one for writing (although the one for reading in B isn't used).
2. Create table `a_table` in A and `b_table` in B.
3. Create 200 entries in `a_table` in a single transaction, then commit.
4. In parallel:
   a. Do unrelated write IO on `b_table`
   b. Read the last entry from the `a_table`

The problem is that 4.b. sometimes returns nothing even though the entry has been committed into the table in step #3. Without 4.a. I can't reproduce the issue, but I think it's just there just messto with thread scheduling.

To reproduce the issue I have to run many (200) instances of the above algorithm in parallel. I wrote this shell script to help me with it. I usually need to run the test for up to 10 minutes.

Not sure it's helpful, but my specs are:

Ubuntu 22.04.1 LTS; 64bit; Intel® Core™ i7-4790K CPU @ 4.00GHz × 8; 16GB RAM; The database is stored on disk with Ext4 (version 1.0)

If I'm reading the Rust sqlx dependencies right, latest master sqlx uses libsqlite3-sys version 0.25.1. That in turn uses SQLite version 3.35.4.

But I tried to modify sqlx to use the libqlite3-sys from latest master and that one uses SQLite version 3.41.0 which I believe is currently the latest version and I can still reproduce the bug.

To understand what is happening on the sqlx layer, one only needs to look into the worker.rs file where we can see that a "connection" is basically a thread that awaits for Commands and then executes them. Note that after a succesfull commit, a confirmation is sent back to the main thread so the step #4 really happens only after the commit has finished.

(2) By Keith Medcalf (kmedcalf) on 2023-03-03 19:40:17 in reply to 1 [link] [source]

The only way that could happen is if you have an incomplete statement (that is, a transaction is open) on the connection on which 4b is running, and your database is in WAL journalling mode. "Fresh" data will not be seen unless immediately prior to attempting the read there is no transaction in progress. This is called repeatable-read and is a property of all ACID databases.

Simply test whether or not this is the case before step 4b.

If the Rust wrapper you are using does not expose the sqlite3_txn_state API, then you will have to write your own method for accessing this information.

It is quite likely that your Rust Wrapper is playing with your connections' transaction state behind your back. This is quite common and bog standard behaviour unless wrapper-writer specifically declares that they are following the SQLite3 API and not diddling behind your back.

(3) By inetic on 2023-03-06 10:06:23 in reply to 2 [link] [source]

The only way that could happen is if you have an incomplete statement (that is, a transaction is open) on the connection on which 4b is running, and your database is in WAL journalling mode.

I can reproduce this issue even with SqliteJournalMode::Delete, so based on what you wrote, this would indicate either that flag is not being passed correctly to SQLite by the sqlx wrapper, or there is indeed a bug in SQLite?

If the Rust wrapper you are using does not expose the sqlite3_txn_state API, then you will have to write your own method for accessing this information.

Thanks, I'll try that and report back.

(5) By Keith Medcalf (kmedcalf) on 2023-03-06 15:36:53 in reply to 3 [source]

I can reproduce this issue even with SqliteJournalMode::Delete, so based on what you wrote, this would indicate either that flag is not being passed correctly to SQLite by the sqlx wrapper, or there is indeed a bug in SQLite?

A bug is SQLite, while possible, is quite unlikely. In DELETE journal mode you would not be able to COMMIT the transaction if another connection was holding open a read transaction on the database file. The transaction would only commit after all other connections had relinquised their read locks.

In that case, you would have a deadlock. The commit cannot proceed until the reader is finished, and the reader cannot finish because you have lost control of it and access to it. However, this is not happening. Yet you claim the commit is actually occurring (by the data being changed in the database -- not by some hokey assurance by the wrapper that the commit was successful -- it may be lying to you after all).

Since you claim the same thing is happening in both WAL and DELETE journal modes, their are two possible explanations:

The COMMIT has not actually completed yet.
The platform implements locking incoherently. This would usually be caused by such things as having proxies (like Docker or other crap that intermeddles between the OS/filesystem and the application program) or having some sort of non-local filesystem (that is, the filesystem runs on a different machine than the application and filesystem calls are RPC rather than merely local access to a local filesystem with remote block storage).

My vote lies with #1. You are being told that the commit has been completed successfully before it is actually completed successfully. This would be a wrapper behaviour error because SQLite itself does not do this.

(4) By Donal Fellows (dkfellows) on 2023-03-06 13:07:08 in reply to 1 [link] [source]

A few thoughts:

Is the database in WAL mode? (It's persistent, but not enabled by default.)
What is the timeout for acquiring a transaction? I couldn't tell in your test code, but if it is immediate-or-fail then the other parallel connection could block the first and cause failures like that.
I can't see where you commit the transaction using a_pool in the parallel code. I don't know if that matters.
You're doing a lot of very short write transactions with b_pool, and are doing them in very quick succession. That's an ugly load for SQLite.

(7) By inetic on 2023-03-07 16:41:01 in reply to 4 [link] [source]

@dkfellows thanks for the hints, I've been digging inside the sqlx code more since I wrote the issue and I believe I found an issue with one of sqlx's dependency. I wrote a ticket on github/sqlx. I apologize for the noise.

Just for posterity, answers to your questions:

The bug is reproducible with and without WAL
I think 5 seconds
Commiting in on the a_pool in the parallel code is not necessary as it's only a read (not write) operation
Yes, that's not something we do in a non-example code, I just found that having it there helps reproduce the issue

(9) By Donal Fellows (dkfellows) on 2023-03-07 16:59:28 in reply to 7 [link] [source]

Commiting in on the a_pool in the parallel code is not necessary as it's only a read (not write) operation

Read transactions can block write transactions, especially when not in WAL mode. With the code as it stands, I can't tell what the transaction state is at the end of the block; if it is either committed or rolled back (the difference not mattering for a read transaction) then it should be fine, but if it is just left hanging around then who knows? Correct transaction state handling is rather important once you're dealing with non-trivial workloads like in your example, at least in my experience (doing parallel workloads in SQLite and Java).

(6) By Simon Slavin (slavin) on 2023-03-07 12:48:17 in reply to 1 [link] [source]

Using the C API, a SQLite query can return with a result code which indicates a non-successful result, e.g. a timeout because the file is busy. I don't know Rust, and in your code there's nothing obvious which shows handling of a result.

How would your Rust code handle a non-successful result ? Does it detect and report them, or just continue with the next call ?

(8) By inetic on 2023-03-07 16:45:31 in reply to 6 [link] [source]

You're right, if an error happens, this would be shown as an empty vector of values. I have since modified the example to be more precise about what fails. However, after more digging I found it's very likely in another sqlx dependency. I wrote an issue on github/sqlx so I think I'll continue there. Sorry for the noise.