SQLite Forum

binary .dump format
Login
A binary dump format is not only not available, it's not even theoretically possible since any byte sequence can be present in any blob which makes format bytes impossible, or at least very difficult - something like MIME boundaries works  but is heavy on checking for boundary matches in content, which doesn't matter much for typical e-mail, but will for DBs. It also means even small blobs will add large boundary identifiers, which will take away from the saving.

I suppose a best-case enhancement for large blob data would be 3-to-4 encoding (aka Base64) rather than the current hex representation in .dump files, which gives you a ~33% increase on blob size in stead of ~100%[1] increase and is de-duplication safe.

This change, or alternate ability will be needed on both the .dump side and the .import side, perhaps indicated with starting a Base64 binary with '0z' in stead of '0x' used for Hex, or some such. Should not be much effort to implement either, but it's certainly not available at this point.

Andreas' compression suggestion is better for full file deduplication mechanisms, but if you have intra-file data section (byte-run) dedupliaction it won't work, plus if the data is similar through blobs it will reduce size well, but if the data blobs are random or noise-like, compression would do more harm than good (same reason why PNGs of photos are much much larger than JPEGs, yet a PNG of a solid colour background weighs almost nothing while its JPEG is still huge).
Text compresses quite well (because of similarity of a language's lexemes and morphemes), so if the ratio of text-to-blobs in the DB is high, then compression may be quite good regardless of the above. I suppose testing is needed for your data. Either way, I think that suggestion is the most viable avenue for you to explore if size-saving is your goal.


[1] I use approximations ~nn% because Hex strings include the '0x' added to a blob, if the blob was 3 bytes long, the resulting text would be 2 + 3 x 2 = 8 bytes long, which is nearly a 200% increase in size, and as blob length increase, that percentage approaches 100%. Similarly Base64 will have such a marker, but it also requires 3-byte alignment so can end with one or two "=" characters, producing a saw-tooth size-gain graph that smooths out (--> 33%) for larger blobs.    
i.e - Lots of small blobs are the worst, whatever encoding you use.