SQLite Forum

Network filesystems for sqlite DBs
Login
> I wrote a small test, the guts of which are:

Why not use one of the ones included with SQLite, so we can all play along?

* [`test/kvtest.c`](https://www.sqlite.org/src/file/test/kvtest.c) from the "[35% Faster Than The Filesystem](https://www.sqlite.org/fasterthanfs.html)" article.

* [`test/speedtest1.c`](https://www.sqlite.org/src/file/test/speedtest1.c)

> `"UPDATE counter set value = ?", value +1`

What realistic scenario does a pure overwrite test simulate?

DB applications are usually read heavy.

Even a data logging application is going to be appending pages to the DB, not continually overwriting the same DB page.

Further, your application is perversely reimplementing AUTOINCREMENT. What possible correlate to a real-world application does this model?

> The idea is to introduce as much contention as possible to see if the transaction locking logic breaks down under stressful circumstances.

Are you suggesting that [SQLite's test regime](https://sqlite.org/testing.html) might be insufficient for ensuring that transactions work as they're supposed to?

> Local single NVME SSD:...2095.206688 secs

This result is likely due to your constant overwriting of a single DB page. If SSD is a sensible choice for a given application, it's because it's read-heavy. A write-heavy DB should not be put on an SSD unless you've done the math and figured out that the [lifetime SSD endurance rating](https://blog.westerndigital.com/ssd-endurance-speeds-feeds-needs/) of the disk will suffice for your application. 1 write per second is a very different application than 10k writes per second.

> NFS...742.014655 secs

This should have been a huge red flag to you: NFS is 3x faster than a local filesystem!?

It doesn't pass any kind of smell test you can come up with. Contemplate SATA's 6 Gbit/sec rating, for example, which is the bottleneck they had to invent NVMe to get past. Now you're saying a 1 Gbit/sec NFS link is faster?

> iScsi

You haven't said what filesystem you're running over iSCSI. [You can't share a filesystem over iSCSI without using a clustering filesystem](https://serverfault.com/q/633556/18096), and I'm guessing you aren't. If you were, I'd expect to see the naked clustered filesystem in a separate test.

Unless your usage model is that only a single computer is accessing this DB over iSCSI, I doubt this test is telling you anything useful.

> Areca RAID is a lot faster than any network, as expected.

You say "with caching," which suggests you have [write-back caching](https://www.geeksforgeeks.org/write-through-and-write-back-in-cache/) enabled, as opposed to the safer write-thru mode. That's a highly risky cheat in any application that cares about durable transactions unless you've got battery-backed cache RAM to close the [RAID write hole](https://en.wikipedia.org/wiki/RAID#Atomicity).

Contrast your NVMe drive, which is almost certainly honoring SQLite's fsync calls. The Areca RAID is effectively not doing so, promising it's written the data to disk even when it hasn't. Anything that isn't battery backed will be lost if the system reboots unexpectedly.

> The final value of the counter after the last worker exited was always what you would hope it to be, even with NFS.

Given that fsync is a kernel syscall and NFS is implemented at the kernel level on most OSes these days, I doubt that test shows what you think it does. Until you have 2+ different OS+hardware stacks accessing the same DB, you aren't fully testing ACID over NFS.

> is there a good way to make it more challenging and flush out problems ?

What is your reason for looking into SQLite on network filesystems in the first place?

If it's just so you can put a single box's storage somewhere else, as with NAS or SAN setups, then your test may be approaching something useful, once you address the criticisms above.

If you're trying to get to a point where multiple machines are sharing storage, then your test doesn't show anything useful at all. You need to run it from multiple physical nodes against the same DB to show that.

> if I could create a VFS layer that provided the same semantics but stored to the network, would that be a more reliable route ? 

I think you'd end up reinventing an existing wheel, is what.

> I just like to have something real and tangible to point to, to say "that's why"... :)

Greybeards on the Internet saying "Don't do that" isn't good enough? :)

What, then, is the value of experience?