SQLite Binary Log support!

(1) By Tyke (tyke007) on 2021-08-09 16:00:07

We have session/changeset/patchset for sqlite, but they are quit limited compare to Mysql Binary Log.


If there is a function generate a binary log for each write transition, then late able to replay the log on a slave instance and get same result, then this can be done with it:


a) save binary log into kafka or other message queue, then you get a quit scalable master/slave cluster.


b) Use Raft leader replicate binary log instead WAL frame,  the raft entity size will be much small compare to https://github.com/canonical/dqlite.


c) You can setup a memory database with binary log store into kafka, use it with Raft you get a high performance and high availability cluster.


Here is a simple planning to add binary log:


1) when start a write transaction create a binary log entity object
2) for each sqlite_step call from user code, save the SQL and bind value into  binary log entity object. (check duplicate SQL and value, save as ref)
3) when commit write transaction, save serialized binary log entity object into memory block. 
4) maybe add changed page hash into the log object, to prevent misuse. (and a hash chain in order for each binary log entity)


To replay it to a database handle, first compare hash, then exec sqlite3_step in order.  Before each step do the SQL statement init if not yet, and bind value, compare changed page hash.


Is this plan feasible?

(2) By Simon Slavin (slavin) on 2021-08-10 00:32:28 in reply to 1 [link]

Your post refers to a ton of things that SQLite doesn't need or use: Raft, Kafka, clusters, binary logs.  They are tools which allow you to do something.  But we don't know what that something is, which means that we don't know how to help you.

Can I ask what you're trying to do that SQLite doesn't do ? And can you explain it without referring to a tool or technique that works with another DBMS ?  You've found the session extension.  You know what it does.  What doesn't it do that you want ?

(3) By Tyke (tyke007) on 2021-08-10 05:08:31 in reply to 2 [link]

Sorry for me bad expression ability, I will try again.

Sqlite has nice performance without lock, on my test it handle 100K write transaction per second with memory db. This kind performance it can be used on a lot new area.

With EXCLUSIVE lock the performance reduce a lot, but you get the concurrent read/write transaction with multi thread. in this mode the write thread will blocked to wait checkpoint. On my test around 10K write transaction per second .

That is why I want binary log, for each write transaction generate a log entity, replay it in order you will get the same result on a diff db handle. In this case you don't have to wait on LOCK for checkpoint. and without LOCK you get much higher performance. And if you don't loss log then you also get persistence storage.

Use it with Raft will allow you have a system able to provide service even one of server crashed.

The log entity should throw error if the SQL include RANDOM(), DATE() function. (so then don't get diff result when replay). It also should exclude some SQL like PRAGMA journal_mode=DELETE.

The session extension is not able to handle scheme change, so it is not suit for the task. The binary log should generate for each transaction, session extension is design for log transaction in batch, and it only work with primary key.

https://github.com/benbjohnson/litestream do what I need, it replicate SQLite change into other server. But it use cross process lock do do the job. which result in poor performance compare to in memory DB or LOCK-FREE mode(At least 1000 times fast).

I hope I explain myself better.

(4) By Simon Slavin (slavin) on 2021-08-10 15:15:39 in reply to 3 [link]

That is far easier for me to understand.  Thank you.  I hope other readers will now understand your questions and be able to answer them.  I also understand why you cannot use the session extension.

I do have one caution for you.  You can test several configurations by making a lot of changes as fast as possible.  And you will get a different speed from each one and find out which is fastest.  Good.  However, this reflects only the situation where you are making a lot of changes as fast as possible.  If you test a situation where the changes are happening less often, a different configuration can be faster.  So your result may be correct, but not useful for any real situation.

Some reasons are that caching happens at several levels in your computer and storage subsystem, and some systems wait for inactivity, or for the cache to be full, before writing their cache.  And your computer tries to write the cache at the same time as running operations in other processes.  Things happen with different interlacing, in a different order.

So if your real application is not going to make a lot of changes as fast as possible, this may not be such a useful test.

(5) By Warren Young (wyoung) on 2021-08-10 18:41:01 in reply to 1 [link]

You mention dqlite, but have you tried and rejected the other major distributed SQLite variants, [BedrockDB] and [rqlite]? If so, what makes them unsuitable?

[BedrockDB]:  https://bedrockdb.com/
[rqlite]:     https://github.com/rqlite/rqlite

(6) By Tyke (tyke007) on 2021-08-11 05:35:37 in reply to 4 [link]

With a master/slave setup, I can dispatch read request into diff slave server by SQL, so the cache is more efficient.


And yes I plan to use it with high speed write system, and I think memory database with reliable persistent log will be a good solution.

(7) By Tyke (tyke007) on 2021-08-11 05:55:18 in reply to 5 [link]

Thanks for the tips, and yes I has study the BedrockDB and RQLite.

Correct me if I am wrong, I think RQLite is not suit for the job. It is not a library solution and have poor performance, and you must use it with RESTful API. And it is hard to dispatch Stored Procedure as one log entity like BedrockDB plug dose.


My idea is come from BedrockDB great design. But BedrockDB is too heavy and not a library solution like DQLite. And it limit into linux X86 platform.

BedrockDB is CPU/Memory hungry and rely on NVME SSD raid with Memory-mapped file, and the design reply on lock cross multi thread.


With Binary log entity you can use memory DB handle, and late you can switch to on-disk mode with exclusive lock if your DateBase is too big.  If you are in single thread mode like NodeJS, you can disable lock and dispatch huge read query into slave node. 

Binary log entity also can be used for audit, time travel.

The idea is to provide a function can be used for multi goals.

(8) By tom (younique) on 2021-08-13 10:58:00 in reply to 5 [link]

This is not related to the thread starter's question, but the disadvantage of BedrockDB is that it doesn't compile for Windows.