SQLite Forum

can I insert an 8GB file into sqlite?
Login
> portability

You think a single file with potentially terabytes of content is *portable?*

I mean, houses are portable by the same logic. All you have to do…*all you have to do…* is lift them up on massive hydraulic jacks and put them on a container ship to move them across continents. What's the problem?

I think you'll find that a pile of tens of thousands of gigabyte-scale files are more easily portable for the same reason that when moving, leaving the house behind and packing up on the house's contents makes more sense.

I'm not telling you that you shouldn't use SQLite for terabyte-scale databases. I'm just telling you that if you expect to need to move it around a lot, you'll find it easier to do on a pile-of-files so you can retry broken transfers, use rsync sanely, etc.

> this article which I found very intriguing

Part of what's going on in that article is that the entire DB can get loaded into RAM, and the OS will treat it as a single file, which changes how caching works relative to thousands of separate files.

I doubt its results are very relevant to your particular application.

> avoiding folder count limits

Those limits in filesystems aren't arbitrary: they exist to keep the filesystem fast. It takes too much time to index arbitrary millions of directory entries, so they set a finite limit to encourage you to make use of hierarchy to keep index sizes sane.

The very same forces exist inside a SQLite database. It's just C code, not magic. Millions of DB rows have an indexing overhead, too.

> I plan to write this in typescript

…and you're concerned with *efficiency*?

Decide: do you want the program to be fast *to execute*, or do you want it to be fast *for you to write*? You can't get both.

I'm not saying it's wrong to write your service in Typescript. For network-facing applications, the latencies and bandwidth limitations of the network often make the inefficiency of the remote network service irrelevant, which is why we have so much scripting on the server side. However, you'd best be clear about this and stop trying to worry about server-side efficiency.

Having said that, realize that when you start talking about scaling the application, efficiency can start mattering again. The tech giants are rewriting Perl, Python, PHP, etc. services in C++, Go, Rust, etc. in order to save money on data centers. Even counting Silicon Valley engineering salaries, it can pay off.

Another common situation is writing efficient services so you can avoid paying a lot in cloud fees. Back when everything was on-premises, taking 10x the development time to get a service to run in 1% of the total system resources versus 10% made no sense, but now that we're loading many customer loads onto each machine in cloud architectures, that 10x inefficiency cost goes directly into hosting fees.

It's the same tradeoff as for the big guys: is it cheaper to pay for the engineering to get it to be 10x faster or is it cheaper to just pay 10x in hosting fees?

Both approaches have their points. You need to think clearly about which approach matters in your particular situation.

> sounds like I am limited to ~2GB max file storage. 

Yes, for a single chunk.

So, you could chunk the files, storing 2 GiB or less per chunk. I'd actually suggest something more on the order of 1 MiB. That's enough to saturate any network buffers, so that you're sure to have the second megabyte of data up in RAM before you finish sending the first. Once you reach that point, there's no point making the chunks bigger, if the point is to serve them over the network.

Or, you can do as iTunes, Plex, etc. do and store the actual files outside the DB, keeping only the metadata in the DB itself.