Cross process change notification

(1.1) By berzerker on 2022-09-01 05:18:29 edited from 1.0 [link] [source]

So I've been spending countless hours now looking for API where SQLite trigger or update_hook can notify changes made to a table from different process (WAL mode). This will allow me to generate tailable log files, from separate process. In nutshell: sqlite-tailer /path/to/database.sqlite it would start producing logs on STDOUT like:

{"op":"insert","name":"table_name","data": {"col": "value", ....}}

Solution like litestream won't work for me because its capturing change in whole WAL pages. I don't want clients to give up the regular flow of opening database with their libraries in-process like they usually do. I tried multiple things so far. First I tried update_hook which didn't work, later I learned it works only in same process that is making the changes and not across the process (which is beyond me specially for a DB supporting multiprocess). Second I tried registering a method, and then creating a trigger to invoke the method with row_id. Turns out that won't work as well and in fact it will cause triggers to fail in the process where that method was not registered but trigger was invoked. Then I tried TEMP triggers, and they didn't work too.

So to cut my story short. Is there any way where I can receive a callback/trigger/hook (essentially anything) that will let me receive hooks and updates?

Update:

Thanks for inspiration by everyone here, I ended up writing a full solution for myself. I've already got it deployed on a production server with medium sized site traffic (13 of writes/sec & 1000s of reads/sec) and over 3 nodes running the replicas in 3 different timezones. Right now there are a few kinks that I will like to be ironed out, but essentially this qualifies for a fully working MVP. I am tracking myself here. Any contributions and feedback is welcome!

(2) By ddevienne on 2022-08-23 10:29:34 in reply to 1.0 [link] [source]

Are you aware of https://www.sqlite.org/sessionintro.html?

Not exactly what you want, I suspect, but could provide insights on the subject.

(3) By berzerker on 2022-08-23 15:41:38 in reply to 2 [link] [source]

Interesting will this work across processes i.e. one process making changes while the other generates change-set? Any help is appreciated :)

(4) By Keith Medcalf (kmedcalf) on 2022-08-23 15:56:24 in reply to 1.0 [link] [source]

I would suggest that you look at what is euphamistically known as "middle-ware".

There is absolutely no reason that SQLite3 would have the capabilities to do what you want. You need to put that in your application. Each and every one of them.

SQLite3 does not have a "central server" that can do what you want.

You can write one, use one that already exists, or simply write your application to do what it is you want to do.

(5) By anonymous on 2022-08-23 20:53:16 in reply to 1.0 [link] [source]

Just hook the file system, run on any change to the database file, be careful not to DOS yourself.

(8) By berzerker on 2022-08-24 05:02:51 in reply to 5 [source]

Run what? A linear scan to figure out what changed?

(9) By Larry Brasfield (larrybr) on 2022-08-24 05:19:16 in reply to 8 [link] [source]

A change log would readily be ordered by ascending time of change. Getting the latest entries would not require a linear scan. What would be run would be a query resembling: SELECT * FROM Changes c WHERE c.date > ?lastNotedChange; this would result in visiting a (typically) small portion of the B-tree by which Changes is ordered.

(6) By Keith Medcalf (kmedcalf) on 2022-08-23 21:11:18 in reply to 1.0 [link] [source]

Write a trigger so that all the tables that might be changed for which you want a log entry generated will log the change.

THen simply run yet another process and have it look at the data_version of the database from time to time (per millisecond, per second, per minute, per hour, whatever frequency your little heart desigres). If the data_version has changed then their might be entries in the log table for you to do something with. Go do.

(7.1) By berzerker on 2022-08-24 05:50:58 edited from 7.0 in reply to 6 [link] [source]

I like this approach will double the write load, but maybe we can get away with this. What is the easiest way to serialize NEW.* into something that can be dumped in a BLOB column? I can also then from my monitoring process have a temp trigger on the log table to evacuate the table as entries get inserted. The only thing to figure out I believe is how to turn NEW.* inside a trigger into a plain BLOB rather plain blob. That way all tables can be dumped into single table and we can subscribe to that table.

EDIT: I think I figured out something keeping it really lightweight.

(10) By berzerker on 2022-09-01 05:15:36 in reply to 6 [link] [source]

Thanks for inspiration I ended up writing a full solution. I've already got it deployed on a production server with medium sized site traffic (13 of writes/sec & 1000s of reads/sec) and over 3 nodes running the replicas in 3 different timezones. Right now there are a few kinks that I will like to be ironed out, but essentially this qualifies for a fully working MVP. I am tracking myself here.

In future I would like to keep contributing to it, and ideally do all of this without triggers. RN I am thinking of introducing support for streaming whole tables for restoring snapshots as instances come up, plus fix identified bugs on log getting double played on node coming up. There still a long way to go, but I hope to make it useful for everyone!

(11) By anonymous on 2022-09-01 06:25:29 in reply to 10 [link] [source]

It is not clear if you are doing any sort conflict resolution for conflicting writes, care to shed some light on how you deal with that?

(12) By berzerker on 2022-09-01 14:03:05 in reply to 11 [link] [source]

So it's the last writer wins. Since I am using Raft you can have two competing writes have local changes written to SQLite, but the last proposer to get consensus will win, and will have his changes eventually synced to all nodes in cluster.

In order to scale these writes today I run 16 raft groups, every-time when a node tries to propose a change it generates the hash of table_name + rowid and uses consistent hashing to pick one of raft groups as master for proposer. Ideally these raft groups can be spread out evenly through out the cluster (today there is a manual shuffle method to do that randomly) and that will help you scale the cluster up.