SQLite Forum

Network filesystems for sqlite DBs
Login
> Why not use one of the ones included with SQLite, so we can all play along?

Two reasons: (1) it didn't occur to me, and (2) I didn't know about them :)

> What realistic scenario does a pure overwrite test simulate?
> DB applications are usually read heavy.
> Even a data logging application is going to be appending pages to the DB, not continually overwriting the same DB page
> Further, your application is perversely reimplementing AUTOINCREMENT. What possible correlate to a real-world application does this model?

In this case, the database holds a lot of different types of data. Quite a bit of it is stored once and then only read later and I didn't really see that as the problem. We also have a fair amount of persistent state for a client application, state that is constantly being modified to reflect what the user is doing, and which they expect to be preserved if they move away from the native app and use the same account on the web-app. It's obviously not a simple counter, but the flags are being set and reset constantly, and the counter allows me to tell whether the end-result is what I'd expect it to be. 

It seemed like a reasonable analogue of what I thought the database might have problems with doing.

> Are you suggesting that SQLite's test regime might be insufficient for ensuring that transactions work as they're supposed to?

Well, no, under *normal* circumstances, of course not. It did seem that if the database-locking was problematic over a network filesystem, though, that this might be a test that would show that up.

> This should have been a huge red flag to you: NFS is 3x faster than a local filesystem!?

It was, in fact. Lacking any other ideas, I'd come to the same conclusion as you did that it was the write-heavy environment that was the cause. It still seems odd though.

> You haven't said what filesystem you're running over iSCSI

In this case, it's my home setup (covid and the entire world burning means I'm working at home) but if it's possible to make it work, the real system would be running on the existing [scaleio](https://en.wikipedia.org/wiki/Dell_EMC_ScaleIO) setup. iscsi is the closest thing I can get to it for testing out ideas before going through the whole "submit a proposal in triplicate, convene a committee, ..." 

To answer your question, I'm running Btrfs, which isn't a clustered filesystem AFAIK. I'm not actually sure what the "real" system uses, but it pretty much has to be a clustered filesystem for the existing usage.

> You say "with caching," which suggests you have write-back caching enabled, as opposed to the safer write-thru mode. That's a highly risky cheat in any application that cares about durable transactions unless you've got battery-backed cache RAM to close the RAID write hole.

... which it does. And the house has 3 Tesla batteries on it as well. I laugh in the face of power outages. Well, actually I don't. I quietly smile on the inside when the neighbor comes around asking why my lights are still on in a power outage, and offer to throw a cable over the fence to power his fridge.

> What is your reason for looking into SQLite on network filesystems in the first place?

From the original post:

"I'm trying to figure out if we could use an existing enormous iscsi-type shared filesystem to allow a database-per-person (which would allow easy per-person encryption for privacy) using a bunch of sqlite databases, or go with one humungous database server with everyone's data inside it running with a local disk."

The multi-tenancy huge-database is the obvious solution, but it doesn't lend itself well to searches across the metadata in a per-user encrypted-data scenario. Encrypting columns of data effectively removes the ability to index things, homomorphic encryption notwithstanding, and not encrypting them leaks privacy.

Conversely, having a bunch of SQLite databases on a shared filesystem accessible to a bunch of nodes makes life a lot easier for the encryption scenario (things like [SqlCipher](https://www.zetetic.net/sqlcipher/) mean the database can be encrypted en toto but still use indexes as if it were not. In my tests on low-data volume databases, the overhead is only ~5% or so rising to ~15% on data-heavy tables). Managing a large number of individual databases rather than one large one has its own issues, of course. 

So what I'm trying to do here is do the groundwork to see what's even feasible, after which we can move onto the proposal stage.

There's a few things we can do at ingres - one of which would be to divert all traffic for a given user account to a specific node, thus giving us single-node access to any given database - and it sounds like this might be feasible from yours and Keith's response. 

The "I want it all" approach would be the multiple writer scenario, where read and write requests (the client app can open multiple simultaneous connections to the service) would come in over different load-balanced IP's, thus giving us multiple-writer requirements. Again, from reading both the responses, I need to re-run those tests from different machines (which did occur to me, but only after posting).

> Greybeards on the Internet saying "Don't do that" isn't good enough? :)

As a greybeard myself, albeit in a slightly different sphere, I wish that were true, unfortunately when I'm challenged about why we want to do it *this* way, it's best to have an answer somewhat more authoritative than "well, some bloke on the internet, who *sounded* like he knew what he was talking about, said X" :)

I do appreciate the time taken to answer the question, thank you :)