SQLite Forum

Network filesystems for sqlite DBs
Login
So, just to circle back, since I've spent the best part of today getting a clustered-filesystem up and running on my little raspberry Pi [kubernetes](https://0x0000ff.uk/imgs/pi-cluster/from-front.jpg) [cluster](https://0x0000ff.uk/imgs/pi-cluster/from-back.jpg)...

Here's how standard NFS did, running using Btrfs on the Asustor AS6510T, served out to three of the workers on the cluster:

````
@k8-w1:~/dbio$ dst/bin/dctest -u /archive/tmp/test.db -i 1 -c -n 10000
Connected to resource: /archive/tmp/test.db
worker  1 value = 29472 after 698.149864 secs

@k8-w2:~/dbio$ dst/bin/dctest -u /archive/tmp/test.db -i 2 -n 10000
Connected to resource: /archive/tmp/test.db
worker  2 value = 29759 after 703.140837 secs

@k8-w3:~/dbio$ dst/bin/dctest -u /archive/tmp/test.db -i 3 -n 10000
Connected to resource: /archive/tmp/test.db
worker  3 value = 30000 after 704.651374 secs
````

I then set up the iscsi target on the AS6510T to be an [ocfs2](https://oss.oracle.com/projects/ocfs2/) filesystem - Oracle clustering filesystem v2. I'll spare anyone reading this the tortuous details, but once it was up and running it was rock solid. Configuring the thing isn't exactly straightforward, though.

Running the same program targeting the iscsi mount-point resulted in core-dumps pretty much as soon as 2 machines were accessing the same file, which wasn't a great start. It seems that with the kernel personality, only flock() locking is supported, not posix locks, which is what SQLite uses by default. 

So I reconfigured the SQLite amalgamation with -DSQLITE_ENABLE_LOCKING_STYLE, fixed the compile-time errors with missing #includes, and tried again. This time the program ran but told me it couldn't open the database file when I tried passing the "unix-flock" vfs string to sqlite3_open_v2(). 

A little frustrated, I tried other vfs locking implementations, and "unix-dotfile" worked, so I could get the below comparison versus NFS. I daresay it's not as efficient as flock() would have been but for my purposes it works well enough:

````
@k8-w1:~/dbio$ dst/bin/dctest -u '/iscsi/tmp/test.db?vfs=unix-dotfile' -i 1 -c -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-dotfile
worker  1 value = 10000 after 149.560367 secs

@k8-w3:~/dbio$ dst/bin/dctest -u '/iscsi/tmp/test.db?vfs=unix-dotfile' -i 3 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-dotfile
worker  3 value = 24595 after 455.827016 secs

@k8-w2:~/dbio$ dst/bin/dctest -u '/iscsi/tmp/test.db?vfs=unix-dotfile' -i 2 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-dotfile
worker  2 value = 30000 after 531.742586 secs
````

Points of note:

1) Both NFS and the clustered FS still maintain the locking semantics, and I still get the correct final result.

2) For NFS, there is clearly a good spread of the locking/unlocking across the various clients, because the very first worker to return is almost at the end of the count. This seems pretty much ideal, in fact.

3) Something worked a bit differently (and it could be the dot file approach) for the clustered filesystem, because the first worker got exclusive access until it had finished, and then workers 2 & 3 fought it out between them.

I did look at using the programs that Warren mentioned (kvtest.c, speedtest1.c) but they didn't seem to be what I want to look at. kvtest seemed focussed on blobs (and I don't really have those as part of the usage model) and speedtest1 looked to just be a time-test, not a correctness test. I do, however, intend to run one of the test-harnesses - th3 seems like a good candidate.

If I can figure out why "unix-flock" isn't working for me, or perhaps replace the kernel personality of ocfs2 with a userspace one that supports posix locks, I'll try the above tests again and see if there's any speed increase - it's not exactly a priority right now though.

I am still kind of encouraged that both the methods were giving me the result I'd hoped for, which presumably means the NFS locking is implemented correctly, or at least sufficiently correctly for this test. We'll see how it works on the test harness.

[edit]

So I got the unix-flock VFS to work, basically by patching the changes [here](http://sqlite.1065341.n5.nabble.com/sqlite-users-DSQLITE-ENABLE-LOCKING-STYLE-1-doesn-t-work-on-Linux-Cygwin-td70789.html) into my amalgamation source. Once patched, it compiled without error on the PI's running Ubuntu 20.04LTS. I guess -DSQLITE_ENABLE_LOCKING_STYLE isn't normally required for Linux since most Linux filesystems will offer posix locking.

I also found [an ocfs lock test](https://github.com/ceharris/ocfs2-lock-validation) Commandline tool to verify that the ocfs2 filesystem did support flock() but not posix-locks. 

The results using flock() rather than dot-file locking were in fact speedier:

````
flock() iSCSI

@k8-w1:~/dbio$ dst/bin/dctest -u 'sqlite:/iscsi/tmp/test.db?vfs=unix-flock' -i 1 -c -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-flock
worker  1 value = 15671 after 144.488417 secs

@k8-w2:~/dbio$ dst/bin/dctest -u 'sqlite:/iscsi/tmp/test.db?vfs=unix-flock' -i 2 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-flock
worker  2 value = 29791 after 306.061729 secs

@k8-w3:~/dbio$ dst/bin/dctest -u 'sqlite:/iscsi/tmp/test.db?vfs=unix-flock' -i 3 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-flock
worker  3 value = 30000 after 307.713789 secs
````

... although I'm still seeing the same 'first one wins then the others fight it out' behaviour, I am still getting the correct answer at the end. 

To Warren's point earlier about common code, I noticed there's an 'mptest' subdirectory in the sqlite source code, which can be used to test out parallel access to a database. I'll be giving that a go next.