SQLite User Forum

Experience with SQLite as a store of Files and Images
Login

Experience with SQLite as a store of Files and Images

(1) By oezmaster on 2022-06-03 19:38:35 [source]

I am working on an Application in C# where I have manage Images, usually in TIFF-Format, and usually of 1-2 MB size.

Eventually I might have to store Word-Documents etc.

Is it feasible to use SQLite as a store of such Images and Files, or should i consider other options (like Document-DBs, Filesystem)? As this is a locally-run, relatively small Application, an embedded DB like SQLite seemed the most natural choice.

I was also inspired by this article: https://www.sqlite.org/affcase1.html and: https://www.sqlite.org/fasterthanfs.html

What bogs me is that i might have to work with Binary Data in C#, which I have little experience in. The File-Contents ought to be stored as BLOBs, if I understood correctly.

Further, I would appreciate any Case-Studies of OSS (preferably .NET) that uses SQLite as its only source of data.

(2) By Stephan Beal (stephan) on 2022-06-03 22:41:41 in reply to 1 [link] [source]

Is it feasible to use SQLite as a store of such Images and Files...

You may not be aware of this, but the Fossil source control system, which manages sqlite's own source code, stores all of the version-controlled content, including all files in many different formats (plain text and binary), in sqlite. That is to say: sqlite excels at storing files of all sorts, within its size constraints (a max of just under 2GB per file).

The site you're looking at right now is served by that same SCM (from an sqlite db), but the forum parts deliberately do not support files (attachments). The forum posts are documents, but not "files," per se. They all live in a single sqlite db, though.

Further, I would appreciate any Case-Studies of OSS (preferably .NET) that uses SQLite as its only source of data.

Fossil itself may be the best case study, but there has been no "study" in the formal sense. It "just works," and has done so for very nearly 15 years now.

(3) By Richard Hipp (drh) on 2022-06-03 23:22:27 in reply to 2 [link] [source]

You can see the sizes of the 250 largest BLOBs in the SQLite source repository at this page: https://sqlite.org/src/bigbloblist

(4) By Trudge on 2022-06-04 17:49:48 in reply to 1 [link] [source]

Are you hoping to save the actual image file in the database? Have you considered simply saving the file path in the database instead? I have found this works as well or better than a database of images.

(5) By Stephan Beal (stephan) on 2022-06-04 18:02:49 in reply to 4 [link] [source]

Have you considered simply saving the file path in the database instead?

If the OP is interested in using sqlite for web apps, keep in mind that every single place where a web app writes to a filesystem is a potential security hole (read up on WordPress's history of security holes for facepalmingly numerous examples). The more tightly filesystem writes are sandboxed, the more secure the app is going to be against intrusion. If an app must write to the filesystem, as opposed to a remote db server, it "really should" be marshaled through a single gateway (a single db file) and not to arbitrary directories or files. Any other approach is a security problem just waiting to happen.

(6.1) By midijohnny on 2022-06-05 08:59:50 edited from 6.0 in reply to 1 [link] [source]

If you do decide to store your images/files directly in the DB - there is something else to be aware of which might be useful for you.

That is the SQLite Archive Files - sqlar - mechanism. So rather than re-inventing the wheel - you could build on top of this feature.

You would then get the advantage of being able to use out-of-the-box tools - like the '.archive' commands to extract/populate your table directly from the cli - so long as the cli you have was compiled to include this feature - use '.help' to list the available commands.

You can even mount a sqlar enabled DB using fuse - allowing you to simply treat your db as a filesystem.

See the documentation on SQLAR - SQLite Archiver also (which is a standalone tool specifically for working with sqlar DBs - but as noted above the standard CLI often comes with the same features).

(7) By anonymous on 2022-06-05 14:11:32 in reply to 1 [link] [source]

1 to 2MB size is on the larger side of the spectrum relative the typical page sizes, you would probably have many half empty database pages that way. The good news is, TIFF is pretty compressible, and passing the image through and efficient compressor (e.g. something LZW based) should in theory compress the data well and not use much CPU resources in the process. Then you end up with hopefully much smaller sizes that can be more efficiently packed in the typical page size.

(8) By Trudge on 2022-06-05 14:20:21 in reply to 5 [link] [source]

Very good points and I agree completely. When I developed for clients all data from a user was verified and sanitized to secure against SQL injections and other possible problems at the server end (no JavaScript). As much as possible I provided drop-down lists, radio buttons, check boxes for user input rather than a text box but even that was verified against a strict set of 'do I want to allow this input' options. Any uploaded files were saved to a specific directory, not chosen by the user.

Given all that, it was a constant effort to stay ahead of the curve.

(9) By Donal Fellows (dkfellows) on 2022-06-06 09:18:00 in reply to 1 [link] [source]

Is it feasible to use SQLite as a store of such Images and Files

My experience is that BLOBs are just fine up to a gigabyte in size, provided you write to them only when you are creating them. Updating a BLOB — including appending to it — is expensive. My application (a kind of simulator) needed multigigabyte BLOBs that were built piecemeal, so I ended up using many of them and concatenating them in application space on read out (not difficult; just need to keep a sequence number with each BLOB).

This was replacing a directory full of lots and lots of binary files (tens of thousands of them, each potentially quite large, enough to be really painful with most operating systems). SQLite works a lot better for this application! The only downside was that write performance was a lot slower by default, but in my case I could mitigate that by disabling fsync() calls (fine for my application; you aren't necessarily in the same position) which brought write performance up for a single BLOB to about the same as direct I/O, i.e., literally hardware limited. Read performance is absolutely great, to the point where nobody on my team has been worried about measuring it.

(10) By anonymous on 2023-10-12 16:54:16 in reply to 1 [link] [source]

While SQLite is generally very good at storing binary data (BLOBs) in the DB, there might be some more things to consider:

If you expose stored files via other access methods, e.g. HTTP, it might be beneficial if the files are on disk. Let me explain: Let's assume you have a web app to upload images. The same web app lets the user preview the uploaded images in a HTML page. Now, every time an image is requested, you have to load it from the DB and send it as a HTTP response. All 'if-modified', 'etag' and cache handling are on you. It might be better for performance, security and complexity reasons, to let a HTTP server (or your web framework) handle all this, giving it direct access to the on-disk files.

Second is about backup: If files are stored in the DB, your DB backup procedure (think "sqlite3 db.sqlite .backup") might take a much longer time than it would take without all these BLOBs. In my web apps, I usually do a hourly DB-backup (which runs quite fast) and backup my binary files via rsync, once per day.

Third: Sometimes it is easier for development and debugging to have all files directly on disk. So, I can SFTP into a server and see what's there. Additionally, I can up/download individual files without having to open a database shell first.

Fourth: Some non-C SQLite drivers are not super-fast when it comes to reading large BLOBs, for instance https://github.com/cvilsmeier/sqinn-go, which I use for all my database work.

So, it is as always: It depends, and your mileage may vary.

(11) By Donal Fellows (dkfellows) on 2023-10-13 13:06:28 in reply to 10 [link] [source]

Sometimes it is easier for development and debugging to have all files directly on disk.

The more files you have, the less that becomes true. With just 10 files, having them on disk is pretty easy. With thousands of them, using a database instead looks mighty attractive, especially as then you're no longer relying on filenames for all the metadata. Also, having them in a DB means never needing to introduce workarounds for OS limitations on the number of file descriptors you may have open...

(12) By Simon Slavin (slavin) on 2023-10-14 12:24:25 in reply to 1 [link] [source]

Including these relevant links, not because they're vital, but because nobody seems to have mentioned them yet and future readers on this subject may not have found them:

https://www.sqlite.org/appfileformat.html

https://golangexample.com/an-unscientific-benchmark-of-sqlite-vs-the-file-system-btrfs/

(13.1) Originally by Richard Hipp (drh) with edits by Stephan Beal (stephan) on 2023-10-14 14:12:39 from 13.0 in reply to 12 [link] [source]

Thanks for the links, especially the second one which I did not previously know about. I have added the second link to the "Related Studies" section of the 35% Faster Than The Filesystem document in the SQLite documentation.

(14.2) By Nuno Cruces (ncruces) on 2023-10-15 09:56:48 edited from 14.1 in reply to 10 [link] [source]

In general if you want to store larger blobs in the database, you want use the blob streaming API, and want your driver to support it. I guess Sqinn could support at least reading blobs that way?

Also, not sure what you use Sqinn from, but my cgo-free Go driver has decent support for streaming blobs (they even implement io.ReaderFrom/WriterTo).