SQLite Forum

FYI: binary dump
Login

FYI: binary dump

(1) By anonymous on 2021-12-13 10:16:02 [link] [source]

Some months ago, there was a forum thread about the possibility of binary database dumps. This sounded like a potentially useful idea, so I implemented it. (GitHub)

(2.1) By Larry Brasfield (larrybr) on 2021-12-13 11:07:46 edited from 2.0 in reply to 1 [link] [source]

Interesting. I have a few questions on this.

At the GitHub page you linked, this appears:

  Garbage entered in UTF-16 mode can't be losslessly extracted in UTF-8
  mode and vice versa.  The database file gigo.sqlite is included as an
  example of this.  pragma integrity_check reports no errors, but the
  shell's .dump command produces output that causes errors if you try to
  load it into a new database.
1. Why should that non-reversible .dump scenario not be considered a SQLite bug?

2. Given that a SQLite DB file has a platform-independent format, what advantages accrue from using a "binary database dump" over just copying the (binary) database file? Is just the reversibility (which will likely be fixed)? Or does it achieve the efficient delta representation mentioned in that forum thread?

3. What sort of conversion time and file size performance to you see?

(4) By anonymous on 2021-12-13 13:50:25 in reply to 2.0 [link] [source]

Given that a SQLite DB file has a platform-independent format, what advantages accrue from using a "binary database dump" over just copying the (binary) database file?

You save on size (indexes aren't included in the dump).

You save on complexity (no ensure-everything-is-checkpointed dance needed for WAL mode).

What sort of conversion time and file size performance to you see?

I'll have to amend some of the remarks about dump time. Apparently, the shell is slow at dumping blobs.

Blob-heavy example: a recent sqlite.fossil with a size of 160 MB.

quantity sqlite3 s3bd
dump time (user+system) 11.76+0.31 0.45+0.15
dump size 288 MB 144 MB
reload time 3.10+0.48 1.14+0.32

Blobless example: a local cache of Disqus threads with a number of indexes, 122 MB.

quantity sqlite3 s3bd
dump time (user+system) 0.50+0.06 0.43+0.05
dump size 60 MB 46 MB
reload time 2.80+0.48 2.02+0.45

(6.1) By ddevienne on 2021-12-13 14:40:11 edited from 6.0 in reply to 4 [link] [source]

Can you please add the times of VACUUM INTO and .backup in the shell?
These are binary dumps as well, so might not suffer as much with blobs,
and would provide a fairer comparison with your new dump.

Also, to have a baseline for your file-system, the size+time of a pure file-copy of the DB?

(11) By anonymous on 2021-12-14 02:17:11 in reply to 6.1 [link] [source]

Using the same sqlite.fossil:

method time notes
cp 0.00+0.05 ¹ ²
.backup 0.03+0.25 ¹ ³
vacuum into 0.12+0.26 ¹ ³
s3bd 0.45+0.15

¹ no space saved
² synchronisation issues
³ not streamable

(12) By ddevienne on 2021-12-14 09:57:03 in reply to 11 [link] [source]

First, user+system time is often misleading, especially with parallelism.
I know you do not have any, but still, real or wall-clock rules IMHO.

Second, your test databases are too small, the timing are not reliable enough.
MB/sec bandwidth numbers would be helpful, as would be an x-factor or % to a baseline you'd choose.

Third, as I suspected, speed-wise, SQLite's own binary dumps seem to win.

You are perfectly correct that they do not save space.

That they are not streamable, is more debatable.
You'd have to be more specific here, at least from my point of view.

I do not want to detract from what you did.
It shows good thinking, and good execution, from what I saw.
But I think it is helpful to put that in context.

Given this thread, perhaps Richard might consider changes to allow
VACUUM INTO or the Backup-API to drop redundant index information,
leaving their definitions, so they could be recreated later. That would
address your space concerns. I'm not sure Richard will consider it though.

And again, thanks for sharing your work publicly. Very interesting.

(10) By Simon Slavin (slavin) on 2021-12-13 20:16:44 in reply to 4 [link] [source]

Given the idea that indexes take up space, and that you're trying to save space, here's an idea: potential indexes.

CREATE POTENTIAL INDEX album_ISAN ON album(ISAN)

(Note that you cannot combine POTENTIAL with UNIQUE.) This has SQLite store the CREATE command but not actually create the index until it decides it would be the right one to use. At that point, the index would be created, stored and used. The VACUUM command would not preserve these indexes, just the command which would allow them to be recreated. Or perhaps that should be an option for the VACUUM command.

A way for one programmer to implement something like this, without changing SQLite itself, would be, before taking the copy, to write a routine to store CREATE INDEX commands which don't have UNIQUE in a table, then delete the index, before doing the VACUUM. Their routine to open the database would check the table then CREATE all these indexes using IF NOT EXISTS.

Top-of-my-head initial thoughts on the subject. POTENTIAL is probably a poor word for it, and there are probably better ways to implement the idea.

(18) By anonymous on 2021-12-31 18:58:02 in reply to 4 [link] [source]

I'll have to amend some of the remarks about dump time. Apparently, the shell is slow at dumping blobs.

This niggled at me, so I wrote my own textual dump code, which turned out to be just as slow.

Profiling revealed that the culprit is stdio locking; acquiring and releasing a lock for each byte of a dumped blob kills performance.

A less-portable version that brackets the whole operation with flockfile and funlockfile and uses *_unlocked versions of stdio calls runs in less than one tenth of the time.

Lesson: profile!

(3) By ddevienne on 2021-12-13 11:59:53 in reply to 1 [link] [source]

Interesting, thanks for sharing.

After looking at the code a little bit, here are various thoughts that crossed my mind:

  • Big Endian. Given that most of our CPUs are LE (ARM can do both, but is LE by default), why BE?
  • Varint encoding. I just wonder if it's worth it. One day I need to compare the size vs speed tradeof.
  • No Pages/Blocks. This basically precludes any kind of parallelism processing tables for example.
  • Custom record-format. Why not reuse the SQLite record-format as-is?
  • Not using blob IO. This increases memory-use for large blob columns.

Thanks, --DD

PS: Although regarding my last point, I don't see how you can deal with that generically...
I think once you've step()'ed, the whole record is loaded in memory,
instead of sqlite3_column_blob() lazily fetching the blob on demand.
So it's basically too late to use incremental blob IO... There would
need to be native support in SQLite to ask for the blob handle instead
of the blob value itself at the SQL level (perhaps with a size threshold),
and a new column API to get that handle.

(5) By anonymous on 2021-12-13 14:21:29 in reply to 3 [link] [source]

Big Endian. Given that most of our CPUs are LE (ARM can do both, but is LE by default), why BE?

The code still loops through every byte, so there's no speed advantage to be had.

Varint encoding. I just wonder if it's worth it.

If you're always going to compress the output, fixed-width values win. But if the database mostly consists of already-compressed blobs (e.g. a Fossil repository), another compression step is wasted effort.

No Pages/Blocks. This basically precludes any kind of parallelism processing tables for example.

Streamability was one of the design criteria.

Custom record-format. Why not reuse the SQLite record-format as-is?

Because it is not available at the prepared statement level.

(7) By ddevienne on 2021-12-13 15:56:12 in reply to 5 [link] [source]

The code still loops through every byte, so there's no speed advantage to be had.

If you don't varint, and are already in the target encoding (which is very often if choosing LE),
then you don't, that's the point. In your code, you'd simply wd(..., &t, sizeof(t)) basically.

Streamability was one of the design criteria.

One does not prevent the other IMHO.

You can interleave blocks from different tables basically,
as if you are multiplexing the different table-streams.

(8) By Warren Young (wyoung) on 2021-12-13 19:14:03 in reply to 3 [source]

why BE?

Surprise! SQLite is big-endian.

I assume it is so because it was born when Sun SPARC machines were the pinnacle of "serious Unix," which in turn caused "network byte order" to pass as a whitewash for "SPARC byte order".

(9) By Richard Hipp (drh) on 2021-12-13 19:55:44 in reply to 8 [link] [source]

MIPS, PPC, SPARC, Alpha, and pretty much everything else other than x86 was big-endian. Network byte order is big-endian. I did not foresee that x86/64 would continue to dominate and lead to the extinction of the others.

Also, big-endian has the nice property that unsigned integers compare in numeric order using memcmp(), which I thought at the time would be important. (Turns out it was not.)

(13) By ddevienne on 2021-12-14 10:01:31 in reply to 8 [link] [source]

I know that, of course. I've read the file-format doc many times over :).

But we are talking about a new format here. Thus BE does not make sense to me.
SQLite's own format predates it by a few decades OTOH, and is rooted in *NIX largely BE philosophy.

(14) By Warren Young (wyoung) on 2021-12-14 11:11:01 in reply to 13 [link] [source]

The point of this software is to present an exact binary dump of the on-disk SQLite data. Why would it be translated in any way?

(15) By ddevienne on 2021-12-14 11:38:48 in reply to 14 [link] [source]

Sorry, does not compute... What do you mean?

If by that software you mean anon's binary dump, it uses the SQLite API,
so gets back native endianness (i.e. LE...) fixed-sized numbers, so that's
neither the on-disk data, nor do I request any translation.

In fact, I suggest NOT translating the SQLite BE format and varints,
to LE in memory, and to then putting it back in BE, in a different varint encoding.
But only leave SQLite's own unavoidable translation, and not add any.

(16) By anonymous on 2021-12-14 12:00:36 in reply to 13 [link] [source]

Actual reason: the author finds debugging easier when multi-byte entities are stored big end first.

(17) By anonymous on 2021-12-14 19:16:50 in reply to 3 [link] [source]

  • Custom record-format. Why not reuse the SQLite record-format as-is?

The access of SQLite record format is not exposed as a API, which means that if you use it, you must include your own implementation in your program. I think that this ought to be fixed by exposing API for implementing this, so that it is not necessary to include a duplicate implementation in the application program.