SQLite User Forum: Thousands of SQLite Databases for each user

On 2021-03-24 17:47:36 anonymous said:

SQLite is not designed for large multi-user distributed applications. But you have described a use case that might match SQLite's sweet spot.

The word "scaling" is often used lightly, like the word "terrabyte". The magical runeword "horizontal scaling" does not by itself make giant workloads function well. That's why you need to look very hard at what is really meant here, and if you in fact are ever likely to have a giant workload. Not many do.

> I see Internet threads online advising against this because it makes it harder to horizontally scale.

The scaling for any one mostly read-only SQLite database is firstly vertical, and horizontal doesn't tend to arise for a long time if ever. Consider the number of concurrent reads you can get from an SQLite node before you even need to feed it more CPU and RAM, and then up until the maximum amount of RAM and CPU it is possible to give that node these days. It's such an enormous number of hits that if you still need to look at horizontal scaling then you probably have lots of ways of paying for it. Horizontal scaling certainly is possible, but by comparison I have never heard of a way of having a single Postgres node coming within one zero of the hitrate of SQLite, because Postgres does different things. From the limited information you provide, the use case is a good match for SQLite.

As a subset of this, it is conceivable that your problem with SQLite becomes that the relatively few writers briefly but noticeably block some of the vast number of readers even with WAL2. That is an area I'm looking at where SQLite is more simplistic than is probably necessary, but even with SQLite as-is there are things you can do to address this.

The requirements and design you have sketched out in the original post plus followups suggest that, even if load is not evenly distributed across each SQLite database, you can still serve billions of SQL queries in total per day (I get that _guess_ by multiplying "thousands of databases" by a conservatively low number of requests per database.) There are assumptions in here such as no merging of data between nodes and no pathological queries etc.

> I don't see why it would be hard to scale

From the information you gave, including that it works with plain files today, probably not. As food for thought, the limiting case of plain files and nearly 100% readers is email, where Maildir and similar systems will scale to very large numbers because there are so few moving parts (or, a short codepath.) I'm not saying that you are better off with plain files, but installing a networked SQL database may decrease your scaling by introducing so many more moving parts. SQLite has a short codepath, which is part of why it can serve so many requests per second.

> as it is just singular files that can be easily replicated.

I suspect that may be an incorrect scaling dimension to be worried about from the information you have provided. "Copying single files" suggests that a given SQLite node is having too many reads to cope, which as I have said is a very high number. It also suggests you will need a loadbalancer/director to switch across nodes, which you may not need in the "thousands of SQLite databases" model although you haven't given much to work with. So I suggest that until you have nodes getting to that very high usage, you shouldn't need to be copying any files.

Dan Shearer