Concurrency for in memory database

(1) By Simone Mosciatti (siscia) on 2021-01-10 19:36:43 [source]

Hi all,

I have a particular use case where I would like to have multiple readers from an in memory database.

I tried two different approach, both with their complexity:

multiple threads sharing the same connection
multiple threads each with its own connection, open as a shared cached. (file:some-random-stuff?mode=memory&cache=shared)

Unfortunately it seems that even for ready only workload, the thread don't proceed in parallel.

Is that possible? Or should I expect all the threads to run in parallel?

Best regards

(2) By Keith Medcalf (kmedcalf) on 2021-01-10 23:32:59 in reply to 1 [link] [source]

multiple threads sharing the same connection

This will never work because a connection is only serially re-entrant. Only a single thread can be executing "inside" a connection at a time.

multiple threads each with its own connection, open as a shared cached.

Although shared cache can do this through the magic of illusion, shared cache was designed for really itty bitty boxes where CPU execution is measured in seconds per instruction and RAM is measured in single-digits of bytes. It is not usually not suitable for use except in environments where performance is not a consideration or a possibility.

Use a proper connection-per-thread and put the database on a shared memory device such as a RAMDISK or on Linux /dev/shm (or the equivalent for whatever OS you are using).

Note that in almost all cases there will be something which is inherently serial and that the illusion of concurrent threads is almost always entirely that: an illusion.

(3) By anonymous on 2021-01-10 23:53:29 in reply to 1 [link] [source]

Consider disabling the shared cache. And also, if possible for your use case, turn memory stats collection off using the SQLITE_CONFIG_MEMSTATUS configuration option

(4) By Simone Mosciatti (siscia) on 2021-01-11 11:28:01 in reply to 2 [link] [source]

Thanks!

Basically you are saying that parallel reads against an SQLite (in-memory) database are basically impossible.

And the solution you are suggesting is a connection per thread, without shared cached.

However, I should not expect a linear increase in performance but some marginal increase.

You did not mention the WAL journal, it is something I would consider in your opinion?

Thanks a lot!

(5) By Simone Mosciatti (siscia) on 2021-01-11 11:28:19 in reply to 3 [link] [source]

Thanks a lot!

(6) By Keith Medcalf (kmedcalf) on 2021-01-11 14:00:54 in reply to 4 [link] [source]

Basically you are saying that parallel reads against an SQLite (in-memory) database are basically impossible.

SQLite3 does not really support the concept of an "in memory database". Things that are "in memory" are not databases, they are temp stores. I suppose that one could write a VFS that allowed a database to be stored in memory but currently that is not really the case. A database exists on a filesystem. You can create a filesystem that is resident in memory (that is called a RAMDISK or you can use the /dev/shm on Linux or whatever it is called on the Operating System you are using) and put a database there.

However, I should not expect a linear increase in performance but some marginal increase.

For some definition of marginal, that is correct. This is how all multileaved processing works. You add more "leavings" to the mix until either (a) the management overhead of the "leavings management" outweighs the advantage gained by adding the "leaving" or some resource under contention is 100% utilized. The objective of "Performance Management" is to obtain maximize the utilization of each resource in contention while minimizing the impact on each individual "leaving" compared to that "leaving" having exclusive access to all resources. This is called the "Multiprogramming Ratio". Given a set of "leavings" (processes/jobs) X the "multiprogramming ratio" is the ratio between the elapsed time taken to run all the components of X side-by-each (concurrently) vs one-after-each (serially). The objective is to "adjust" the contents of the set X such that the MPR is maximized while the impact on any particular component of X is still acceptable.

This is the same process as is used to determine how many elevators are required to service an office or apartment building. You can not spend money to buy elevators and have people queued up waiting for the insufficient capacity all the time and thus not manage to accomplish very much of anything at all because everyone is always waiting days for the elevators that are already full, or you can go crazy and buy one elevator per office/apartment and have them unused 99.999999% of the time. The "trick" is to spend just enough money for just enough elevators that most of them are used to their maximum capacity most of the time without anyone having to wait longer for service than they would if there was no competition for elevator rides (or, more reasonably, that no one wants to rent an apartment or office there because it sucks).

(7) By Rowan Worth (sqweek) on 2021-01-11 16:22:49 in reply to 1 [link] [source]

You might want to check this one:

https://sqlite.org/threadsafe.html

But in the default serialized mode I'm pretty sure the mutexes involved are per-connection, so I wouldn't expect the threading mode to inhibit multiple threads which are using separate connections.

It's possible that in-memory databases don't follow exactly the same locking protocol (which is usually based on file locking primitives) and need to be more conservative about reader access. Normally the database pages held in memory provide a kind of private "staging area" for the connection to collect modifications before committing them to the database proper, I imagine in-memory databases don't have this extra layer of insulation between connections (especially in shared-cache mode).