SQLite Forum: Network filesystems for sqlite DBs

Right, so first off, I understand the basic advice is "Don't do it". I'd like to understand the reasons *why* not to do it, and if any of those reasons could be mitigated in any way.

I'm trying to figure out if we could use an existing enormous iscsi-type shared filesystem to allow a database-per-person (which would allow easy per-person encryption for privacy) using a bunch of sqlite databases, or go with one humungous database server with everyone's data inside it running with a local disk. Both approaches have their ups and downs and to that end I wrote a small test, the guts of which are:

````
    /**************************************************************************\
    |* Loop for the number of times, reading and incrementing the count
    \**************************************************************************/
	for (int i=0; i<count; i++)
		{
		mgr->beginTransaction();
		int64_t value = mgr->intForQuery("SELECT value FROM counter");
		mgr->update("UPDATE counter set value = ?", value +1);
		mgr->commit();
		}
````

which would read and write in a loop (yes, I know this could be one statement, I wanted to make sure reads & writes in a  transaction wouldn't be compromised). The loop is constantly obtaining and releasing the lock on the database (mgr->beginTransaction() calls "BEGIN EXCLUSIVE TRANSACTION"). The idea is to introduce as much contention as possible to see if the transaction locking logic breaks down under stressful circumstances.

After all the updating, the code prints out the final value, and the idea is to run more than one instance of these programs in parallel, and see if the last result is (instances) * (count) in the database - ie: was there any time when the database transaction wasn't honoured ? Obviously I don't care about anything other than the last value, because the intermediate values are going to vary based on which program got more access at any given time.

The results were *kind of* interesting - it did better than I expected over a network... Listing them in increasing performance:

````
Local single NVME SSD:
[@lyonesse DCtest]$ ./RUNME 100000
Connected to resource: /tmp/test.db
Connected to resource: /tmp/test.db

worker  0 value = 199465 after 2089.432570 secs
worker  1 value = 200000 after 2095.206688 secs



NFS, 1G ethernet:
[@lyonesse DCtest]$ ./RUNME 100000
Connected to resource: /archive/tmp/test.db
Connected to resource: /archive/tmp/test.db

worker  1 value = 197756 after 734.510462 secs
worker  0 value = 200000 after 742.014655 secs



iScsi, 1G ethernet:
[@lyonesse DCtest]$ ./RUNME 100000
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db

worker  0 value = 175305 after 342.477881 secs
worker  1 value = 200000 after 392.721822 secs



Local 16-disk RAID:
[@lyonesse DCtest]$ ./RUNME 100000
Connected to resource: /raid/tmp/test.db
Connected to resource: /raid/tmp/test.db

worker  1 value = 177115 after 69.417077 secs
worker  0 value = 200000 after 77.233248 secs
````

Points of interest:

1) The final value of the counter after the last worker exited was always what you would hope it to be, even with NFS.

2) iScsi is a lot faster than NFS :)

3) Local (16-way with cache, Areca) RAID is a lot faster than any network, as expected.

4) A single local PCIe-based SSD is nowhere near any of the others.


The only one I'm really concerned with here is (1). The speed hit vs local RAID I'm completely comfortable with, it's an adequate trade-off for the situation and the actual deployment would be 10G not 1G ethernet so it's likely to be better than that. The situation doesn't change with multiple programs accessing the same database either, example:

````
[@lyonesse DCtest]$ ./RUNME 100000 5
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db

worker  4 value = 444736 after 921.902477 secs
worker  1 value = 468154 after 968.464125 secs
worker  0 value = 485641 after 1005.294065 secs
worker  3 value = 495642 after 1025.872606 secs
worker  2 value = 500000 after 1034.514943 secs
````


Questions are:

a) Is the above the sort of test I ought to be running in order to try and figure out what areas could have issues, or is there some better way to show up any problem ? 

b) Am I just being lucky that there's no impact on the reliability over the network in the above ? If so, is there a good way to make it more challenging and flush out problems ?

c) Given that SQLite is supposed to support multiple threads/processes accessing the same file on a *local* filesystem, then if I could create a VFS layer that provided the same semantics but stored to the network, would that be a more reliable route ? (Assuming I'm missing something with the above results :)

d) Or is there something more fundamental that I'm just getting lucky with, and the whole idea is a non-starter. If this is the case, I'd love a pointer to somewhere it's explained, or some insight :)

A last note. I'm coming to this with an open mind - if there's a genuine reason why it can't be done, I'm very happy to know that. I just like to have something real and tangible to point to, to say "that's why"... :)