Multithreaded reads and writes to a single SQlite database using the C API

(1) By anonymous on 2021-01-05 01:16:34 [link] [source]

I am researching using SQLite as a storage mechanism for saving and loading files. The catch is that there will be several threads trying to read and write records (a file plus some metadata) to/from the database.

It's not important if read/write requests are queued and there is a small delay but it is important (of course) that the files are read/written correctly and that they are read/written in the order the request was made.

I understand that serialized mode is what I want but I am not clear if I need to build in a separate synchronization mechanism in my code or if each of my threads can read/write as necessary and SQLite will take care of it for me.

Thank you.

(2) By Richard Hipp (drh) on 2021-01-05 01:37:24 in reply to 1 [link] [source]

SQLite should serialized the writes for you, unless you take active and deliberate steps to prevent it from doing so.

(3) By anonymous on 2021-01-05 01:42:30 in reply to 2 [link] [source]

Understood Richard - thank you.

Perhaps my next step is to write some code that creates some threads, each of which randomly reads and writes a record with a well specified size/contents over a meaningful period of time and see if anything breaks.

(4) By Keith Medcalf (kmedcalf) on 2021-01-05 02:11:21 in reply to 1 [link] [source]

if I need to build in a separate synchronization mechanism in my code or if each of my threads can read/write as necessary and SQLite will take care of it for me

Threads are irrelevant. Isolation and Serialization operates between connections. Although entry on a connection by multiple threads are serialized by default, this is an entirely different thing than the isolation and transaction serialization about which you are concerned.

(5) By anonymous on 2021-01-05 02:14:56 in reply to 4 [link] [source]

Keith - I will have a better understanding of what you mean when I write some code (about to start) - but if each thread calls open, select, read/write, close code then will my approach work? IE, there is a new DB connection each time and not shared between the threads.

(6) By Warren Young (wyoung) on 2021-01-05 02:29:04 in reply to 5 [link] [source]

Keith is suggesting that you wrap DB updates that must land together in a single transaction. All writes will hit together, or none of them will.

(7) By Keith Medcalf (kmedcalf) on 2021-01-05 04:28:06 in reply to 5 [link] [source]

Warren's comment is also applicable.

No, specifically I mean that there is no isolation between threads, only between connections. So if you have one connection accessed by 300 threads, then all those threads share one isolation context -- the single connection -- and if one thread begins a transaction on that connection, then all threads using that connection are participants in that transaction and any thread may commit the transaction on that connection at any time -- and all threads sharing the connection will be able to see "uncommitted data". There is no isolation between threads -- only between connections.

However, only one thread may be executing in a connection at a time. That means that if one thread executes a command on a connection, then for the duration of that thread executing "inside the library", any other thread attempting to enter the library "on a different thread" will have to wait. In other words the SQLite3 library is serially entrant per connection, not multiple-entrant per connection. This is a consequence of how it is built and cannot be changed. Note that a "statement" is a child of a "connection" and as such "statements" are not isolated from each other -- it is at the parent connection level that isolation occurs.

You do not have to keep opening/closing connections, but you probably want each thread to open and use its own connection so that the processing on each thread is not visible to (or have impact on) other threads/connections until that thread/connection commits its changes. If you do this SQLite3 will arbitrate each threads/connections access and view of the database individually, which is I think what you mean.

(8) By anonymous on 2021-01-05 17:53:55 in reply to 7 [link] [source]

Ok - thank you for everyone's help but I'm hopelessly lost.

Clearly I need to do more homework before asking in here next time or just give up altogether - I feel like I am not clever enough to grasp the basic concepts.

(9) By Warren Young (wyoung) on 2021-01-05 19:22:41 in reply to 8 [link] [source]

Threads are evil. Rather than spend a bunch of time trying to think through all of the potential problems up front, then a bunch more afterward debugging all of the failure cases you weren't brilliant enough to consider up front, justify why you need to use threads in the first place.

If you really do need threads, then justify why more than one of them needs to write to the DB in parallel.

And if you really do need multiple parallel writers, then wrap everything that needs to hit the DB for the whole operation to be considered "complete" in an explicit SQLite transaction, like I recommended up-thread. Then the only question of ordering will be which transaction gets committed, when.

(10) By anonymous on 2021-01-05 22:24:44 in reply to 9 [link] [source]

Thank you for the encouragement.

I am still trying very hard to formulate how I should approach this but the essence of what I need to do is a make a cache.

When a file load operation is requested by the main application, I look in the database to see if it's there. If it is, I read it from the database into memory and pass it back. If it's not there, I download it from a remote source and write it into the database and pass it back.

The catch of course is that there are multiple threads in the main application, requesting these operations in parallel.

The reason there are multiple threads is for performance - the more files that can be downloaded in parallel, the more performant the system will be.

In a typical session, there will be hundreds or even (low) thousands of such requests.

Integrating my code into the application will be challenging - the existing code that does what I am working on is mind boggling complex. Consequently, I have been trying to write a test that does all I need, can be tested in place and then eventually ported into the main application.

(11.1) By Warren Young (wyoung) on 2021-01-05 23:03:26 edited from 11.0 in reply to 10 [source]

In that case, you don't need overlapping writes at all. Each new file you download gets a new ID, and the UI process(es) blocking on that file's appearance from the remote source don't learn the ID until the file's saved to the DB. No other part of the UI will refer to that ID, either.

This sort of design is called write once, read many.

If DB records are ever deleted, it should only be after all UI bits that depend on them are destroyed.

You don't need multiple DB-writing threads to make this work. I can only see a need for 3 threads at most:

A UI thread
An event-driven I/O pump
A DB writer, called by the I/O pump when the last block of a downloaded file arrives, which does the actual DB insertion from the partial blob data temporarily stored elsewhere. That creates a file ID, which you enqueue for the UI thread, which pulls it off on its next repaint, which causes a DB read of the just-created record.

It's possible to do all of this single-threaded, but that depends on having a UI framework with an event-driven I/O feature, which not all do. If you stated your UI platform, someone expert in it might be able to advise. (e.g. Win32, with WSAAsyncSelect().)

(12) By anonymous on 2021-01-06 00:47:03 in reply to 11.1 [link] [source]

Thank you Warren - plenty to think about there and experiment with but sounds hopeful. I will post here when I have some code to show.