SQLite Forum

binary .dump format
Login
Compression tends to reduce the effectiveness of deduplication, since the compressed versions of similar files have very different byte sequences. I'd prefer to avoid the extra overhead of compression altogether.

There *are* some attempts to improve it (like gzip --rsyncable), so this is still worth a try. Even better, zstd has a --rsyncable flag, so let's try:

```
$ sqlite3 ext.db .dump | zstd --rsyncable -o ext.dump.zstd
36.63%   (4217978198 => 1544940304 bytes, ext.dump.zstd) 
$ time sqlite3 posts.db .dump | zstd --rsyncable -o posts.dump.zstd
51.47%   (98522848018 => 50713380137 bytes, posts.dump.zstd) 
real	31m16.458s
```

This produces a compressed dump that's 20-50% smaller than the input DBs (!) (partially thanks to omitted indexes), but `sqlite3 .dump` maxes out a CPU core and only produces output at ~50MB/s, which is much slower than a vacuum operation. I'll test again after 24 hours to see how big the incremental is, but this is probably the best solution without requiring additional code-- just somewhat slow thanks to all the extra text formatting operations.

Edit: actually, this is even better than I realized. In WAL mode the .dump reader doesn't block writers, so spending 30 minutes generating a dump isn't a real problem!