Publish/subscribe queue

(1) By greenfork on 2022-08-04 04:51:16 [link] [source]

Hi! I would like to use SQLite as a pub/sub broker similar to what you would find in popular event streaming software such as Apache Kafka, RabbitMQ, NATS etc etc. The only bit I'm missing is some kind of a mechanism to notify clients about things happening in the database. In PostgreSQL it is called LISTEN/NOTIFY and would perfectly suit all my needs.

Task:

Set up a client communication where the client needs to update its user interface when a record is inserted into the table by other clients. For example, one client inserts a record, another client prints it to the terminal.

Requirements:

Local only, no network.
Everything is in SQLite, any client can connect via programming interface straight to the database instance.
Cross-platform.

What I've tried:

Temporary trigger and sqlite3_update_hook - they work only for the current db connection.
Trigger - works globally, no way to specify only a single connection to notify.

Current ideas:

Use a virtual table to store db connection file descriptors, use triggers to send data to these descriptors and use poll(2) on clients - so far no idea if this can be implemented at all, documentation also says to treat a db connection object as opaque.
Use inotify(7) and watch for changes in a WAL log on the client - this needs additional code for each client, not entirely in SQLite.
Query the database 60 times a second - I want a stateful UI instead of an immediate mode UI because I would like to save on CPU cycles, meaning, I want to have no load when there are no events.

Would you recommend any viable solution?

(2) By Simon Slavin (slavin) on 2022-08-04 06:43:21 in reply to 1 [link] [source]

None of the following include a way to notify only one conncetion. I don't know how to do that, or why it might be important to you.

I have used a 'changelog' table in a database. The application needed a changelog for other reasons, so I just used it to poll for changes too. It had columns for timestamp, which table was changed (change two tables, make two entries), and a copy of the SQL command. (Once the application was in production – i.e. debugged, as far as I knew – I removed the column with the SQL command in. It made the table take up less space, which sped up changing it.)

To poll whether a change had been made to the database, I could just do

SELECT rowid FROM ChangeLog ORDER BY rowid DESC

Since rowid is automatically indexed, it's fast. I originally thought this might be time-consuming, and it was intended to be a stopgap for a better system, but SQLite is so ridiculously efficient that it was easily fast enough to be used in production and I never bothered to change it.

Obviously, trim out old rows from the table whenever you feel like it. You can delete all the rows except the last hundred in one SQL command.

Two other things that might help:

SQLite allows you to set and query an (integer) value. It does nothing about this. Want to change it every time the database changes ? Fine.

https://www.sqlite.org/pragma.html#pragma_user_version

SQLite automatically maintains a schema version, which you can poll. I'm mentioning it even though I suspect you won't have to deal with schema changes. Don't mess with this one yourself unless you know how SQLite uses it.

https://www.sqlite.org/pragma.html#pragma_schema_version

Worth noting, when you do any of these things, that you have to be absolutely certain when your program thinks that a file has been changed. Various parts of the operating system and file system lie to your application (and therefore to SQLite) to speed things up. These lies depend on which version of which OS you're running and which version of which storage driver you're running and other stuff too. So do some testing on the deployment platform before you spend hours writing code that works like this.

Your attention is especially drawn to different versions of Windows, or to developing an Android app on your Windows computer, or an iPhone app on your Macintosh.

(4) By greenfork on 2022-08-04 07:25:56 in reply to 2 [link] [source]

or why it might be important to you.

For reasons, of course :) I'm trying to make a text editor with an event stream as a basic abstraction for its source of truth, event sourcing.

SQLite allows you to set and query an (integer) value. It does nothing about this. Want to change it every time the database changes ? Fine.

That looks interesting. Although having a simple select would probably achieve same thing. In my current idea there's a single table, similar to your "changelog", so I will keep it in mind in case I will need the change across multiple tables.

Thanks for the tips about "modified" status, I will need to keep an eye for this on Windows.

(8) By Simon Slavin (slavin) on 2022-08-04 08:53:52 in reply to 4 [link] [source]

If you're doing even sourcing, then you need a changelog anyway. Your document (in this case, a text document, presumably) can change only when there's a new entry in the changelog. Therefore, the changelog system like the one I described might be useful.

The command to check to see whether a document has changed … keep a copy of the last rowid for your version, and compare it with one the SELECT command returns, is lightning-fast under SQLite. It should allow you to do polling without making your applications lag.

Though simply using a PRAGMA to set or change one integer variable might be even faster.

(9) By greenfork on 2022-08-04 09:13:47 in reply to 8 [link] [source]

Yes, I will totally have a "commands" table and an "events" table and maybe something else. The reason I would like to avoid polling is to not do any work while the application is idle. The approach of constantly polling for changes is great in game development because you have to poll for other changes anyway. In my case I would like to avoid being told the tales of how VSCode, reportedly, wasted 13% of CPU to make the cursor blink :)

I don't have any hard data to backup my fears though. Using just PRAGMA looks as the fastest option, that's for sure.

(3) By Gunter Hick (gunter_hick) on 2022-08-04 06:50:56 in reply to 1 [source]

IPC is not usually a function of the data storage layer. That being said, there is nothing to prevent you from implementing SQL functions and/or virtual tables that wrap whatever IPC calls you need and make them available.

You can probably even call them LISTEN, UNLISTEN and NOTIFY.

CREATE TRIGGER moveit AFTER INSERT ON jobs BEGIN SELECT NOTIFY('jobs','you have work'); END;

Which would notify (by your favorite IPC method) any clients that have previously executed SELECT LISTEN('jobs') and not SELECT UNLISTEN ('jobs').

(5) By greenfork on 2022-08-04 07:43:29 in reply to 3 [link] [source]

Would you have any tips on how do I extract the unique to the client information when the client calls SELECT LISTEN('jobs')? My understanding is that in the LISTEN function I will need to store the client connection or some kind of a handle that client passes to me for future IPC calls. How would you recommend to pass the handle?

(6) By Gunter Hick (gunter_hick) on 2022-08-04 08:05:27 in reply to 5 [link] [source]

That depends entirely on your OS and process setup.

On linux and for threads within a process, you could use the phtreads library functions. For separate processes, the signal mechanism may be more appropriate. Or message queues. Or whatever.

In all cases it would probably be a good idea to maintain a table of channels and listeners; said table could reside in your splite database file, to be more easily manipulateed and queried in the implementations of the functions.

You will have to read up on IPC as supported by your target system. And do the implementation work yourself.

(10) By greenfork on 2022-08-04 09:36:39 in reply to 6 [link] [source]

Sure, I will have the said table, that looks as a good architecture. For the IPC I will most definitely choose a Unix socket. Probably my misunderstanding is about whether I can identify the client that called LISTEN only through the database, or whether I need two programs to communicate beforehand. From what I've read, the programs must communicate by themselves, independently of the database layer, so it makes sense that the LISTEN function will insert a database record as well as do any other appropriate communication-related things outside of SQLite. Thanks for the function interface, that makes sense now.

(7) By Ryan Smith (cuz) on 2022-08-04 08:08:59 in reply to 1 [link] [source]

Adding to what others said, the idea is impossible to implement with SQLite (at least in the way other DB engines do it) since SQLite does not have a centralized engine/process.

Any running process hosting an SQLite connection may at any point exit, become engrossed in a long-running query, or may pop in and out of existence many times a second. Nor does it know of any other current connection via internal methods.

The only time an SQLite connection knows of another is if that other connection have been attached, or if it is finding transaction locks.

To implement any sort of state machine for a DB, you need a centralized manager of it. Luckily that is not too difficult, there are some projects, like litespeed, that does this and can make the DB function as a normal centralized DB engine, over networks and the like. I'm sure it can be made to add such functionality - the only trouble being that nothing precludes another normal SQLite process-connection to also open that DB and make changes, unbeknown to all.

Triggers, special state queries/updates and all that can work (and have done so) in specialized applications, but to achieve this for normal connections means that you have to control EVERY connection that can access the DB so it plays along - which is obviously impossible in the general case.

Would you recommend any viable solution?

Afraid nothing that doesn't require a custom SQLite file format which only your processes can read/write and no other normal SQLite connection can access.

If you can control all connections that can be made to the DB, then it's easy - I like Gunter's LISTEN/NOTIFY functions, but I don't see how you will be able to control ALL accesses, and the first connection made by a normal SQLite instance to your DB will not know anything about it, nor contain any custom functions.

Maybe design your own custom SQLite file format that only your own SQLite connector can read/write and so you can be sure ANY connection to it will play along, BUT, then you are no longer making an SQLite thing, you are making something else.

(11) By greenfork on 2022-08-04 10:17:26 in reply to 7 [link] [source]

Thanks for your thoughts, that helped a lot. I think I will go with an approach where I control every client and make them define the required functions. The centralized manager is either way present in my current design. For any other sqlite connections I will provide a loadable extension with an empty implementation of the said functions.

(12) By Keith Medcalf (kmedcalf) on 2022-08-04 16:32:17 in reply to 1 [link] [source]

Set up a client communication where the client needs to update its user interface when a record is inserted into the table by other clients. For example, one client inserts a record, another client prints it to the terminal.

This would be extremely annoying. I can imagine a sweatshop full of order entry clerks where each sweaters' updates interrupt and destroy the in-progress entries of the other sweaters' by triggering useless UI updates.

(13) By greenfork on 2022-08-05 10:24:21 in reply to 12 [link] [source]

Sure, there's a case where this is annoying. My example is just a simplified version of the general mechanism I'm trying to achieve. In general this approach of processing a stream of events is not novel and it is used in UI in frameworks such as React.js and even languages like Elm that support the functional reactive programming paradigm. In my specific case where I develop a client-server architectured text editor I'm trying to set up a mechanism for a) make client responsible for querying all the data it should display and b) allow extensions to subscribe to specific events and enhance them/add new events to the stream.