Are the changes done after a transaction is performed always available to all connections to a database?

(1.1) By Leandro Santiago (leandro) on 2021-02-28 12:56:00 edited from 1.0 [link] [source]

Hi all,

in my application, I have multiple connections/thread to a database (I'm on linux, database stored on xfs and ext4) I'm using WAL.

Each thread "owns" one connection. There's one writer thread, which is the only one which writes in the database, doing transactions with usually hundreds of inserts/updates/deletes each.

All the other threads which access the database are read-only connections (mode=ro, and only allowed to perform query calls) with private cache each.

Every time the writer thread successfully commits a transaction, it notifies the reader threads about the updated data and the readers immediately read it after receiving such notification.

I noticed that, when I have only one reader thread/connection, this schema works as expected and the reader thread can always see the updated database (even performing thousands of reads per second). But, when I increase the number of readers, they sometimes appear to see an old version of the database, before the transaction.

I've been relying in the fact that after a transaction is persisted, all connections will "see" the same database, but I confess I don't know if I understood sqlite well or if such guarantee actually exists.

It might also be a bug in my application as well, so before I spend more time debugging it, I'd like to know if sqlite offers such guarantee and, if, not, if you have any advice on how to proceed to achieve such results.

More technical details:

The application itself is written in Go (and it's open source, at https://gitlab.com/lightmeter/controlcenter/ :-)).
The writer thread notifies the reader threads via Go channels, and the readers are organized in a "thread pool", where the thread which will handle the "results" of a transaction chosen arbitrarily by reading the same channel.
Im'm using go-sqlite3 (wrapping sqlite 3.34.0).

Thank you folks in advance for such amazing piece of software :-)

(2) By Larry Brasfield (larrybr) on 2021-02-28 13:55:16 in reply to 1.1 [source]

I think your application would benefit from using SQLite Shared-Cache Mode.

(3) By Leandro Santiago (leandro) on 2021-02-28 14:58:17 in reply to 2 [link] [source]

I remember doing some experiments with the shared cache on the reader connections and notices that it ended up serializing all queries from different connections, much slower than having a share-nothing approach with private cache.

Should such behaviour be expected with the shared cache?

In any case, I did such experiments quite a while ago, so it was probably some bug in my code, so I will try to do some more experiments again.

Do you think the private cache could be related to the original behaviour I experienced?

(4) By Larry Brasfield (larrybr) on 2021-02-28 15:30:30 in reply to 3 [link] [source]

shared cache ... ended up serializing all queries from different connections

During your experiments, did you use "Read-Uncommitted Isolation Mode"? Without that, even read queries are serialized.

Do you think the private cache could be related to the original behaviour I experienced?

I see that as likely, but have not read enough code or docs to see how SQLite determines that a page cache has become stale. But I find it hard to imagine that it would go to the persistent store to determine that, as that would reduce the performance of the cache significantly. Further, I do not see your post #1 as describing a violation of the ACID promise. (No real software could guarantee that all threads in all processes see the same DB at any instant.) Hence, I see a likelihood that the shared cache, (sharable across threads since v3.5.0), will cure the cache staleness issue you are seeing the effects of. And it appears that the read serialization you disliked can be alleviated as described in section 2.2.1 .

You would likely get relief from your stale cache issue by calling sqlite3_db_cacheflush() after doing writes. It's worth an experiment to see how that performs relative to using "Read-Uncommitted Isolation Mode" and a single (shared) cache.

When experimenting, you might increase the cache size when there is only one rather than many that are not shared.

(5) By Keith Medcalf (kmedcalf) on 2021-02-28 16:34:54 in reply to 4 [link] [source]

Shared cache is designed for running SQLite3 on a wrist watch that only has 3 or 4 bytes of memory. It should be used only when absolutely necessary only on platforms that are already "slow as molasses running uphill in an Edmonton winter".

If you have a "real computer" (anything more powerful than a programmable calculator) then shared cache is generally a really bad idea.

(7) By Keith Medcalf (kmedcalf) on 2021-02-28 16:48:39 in reply to 4 [link] [source]

I see that as likely, but have not read enough code or docs to see how SQLite determines that a page cache has become stale. But I find it hard to imagine that it would go to the persistent store to determine that, as that would reduce the performance of the cache significantly.

Since a transaction is required before any read can be made from the database, the action of starting a read transaction, as far as I can see, will check the data_version so that the cache can be marked stale (discarded) before the read transaction is started. Once a read transaction is started the database cannot be updated so there is no need to check the data_version again until the next transaction is started on that connection.

(6) By Keith Medcalf (kmedcalf) on 2021-02-28 16:39:00 in reply to 1.1 [link] [source]

Are you sure that you are actually committing the transaction when you think you are committing the transaction and that Go isn't "helping you" by running the transaction on a separate thread such that it is not actually committed when you think it is?

Are you sure that your "read thread pool" is not "hanging open" a prior transaction?

(9) By Leandro Santiago (leandro) on 2021-03-01 09:35:42 in reply to 6 [link] [source]

I am not sure about either, as I suspect the issue is caused by bugs in my code, but now I know where to start investigating it :-)

The go sql layer is indeed quite "thick" and there might be the case, but I believe it's way more likely to be caused by my application itself.

(8) By Gunter Hick (gunter_hick) on 2021-03-01 07:15:17 in reply to 1.1 [link] [source]

For connection B to see the changes of connection A, and barring all special modes of operation, the following sequence of events has to occur in excatly this order:

1) Connection A has to COMMIT it's transaction (implicitly or explicitly)
2) Connection B has to BEGIN it's transaction (implicitly or explicitly)

This hold true independently of whether the connections are readers or writers. Thus, any reader that starts a transaction before the writer commits it's transaction will see the state of the database as it was before the writer started its transaction - and will continue to do so until it starts a new transaction itself.

What you are seeing is interference from the go wrapper, which is second guessing when to BEGIN and COMMIT transactions.

(10) By Leandro Santiago (leandro) on 2021-03-01 09:36:41 in reply to 8 [link] [source]

Thank you (and all the other folks) for the answers.

From my application's perspective, that's the case, as a read transaction starts (that's guaranteed by the go channel semantics) only after a commit has been performed, but I confess I'll have to debug it to figure out whether that's done correctly.

I noticed that the Go database wrappers are quite "thick", with many layers from the users till the calls to the database itself, and I'll try to understand better how it decides to start and commit a transaction (apart from the explicit "BEGIN" and "COMMIT).