Smart Way to Keep Databases in Sync?

(1) By martin2020 on 2024-02-16 18:51:00 [link] [source]

I am building a service that includes a web server and a client app, both storing data in SQLite databases. In the client app, I would like to periodically check the server for updated data.

For this purpose, the dbhash program seems very helpful for identifying changes on the server that require the client to do a new sync with the server. That is, I can fetch the latest dbhash from the server, compare it to the latest dbhash on the client, and very efficiently confirm that the data is in sync.

My question is, is there a smart method built in to SQLite to sync the data when that is required? One solution I am considering is sending the server the latest dbhash from the client and requesting a JSON payload containing all the updates since that dbhash. But I am not sure how that would handle deletions and that sort of thing.

Anyway, any guidance is much appreciated. Thanks.

(2) By martin2020 on 2024-02-16 19:01:37 in reply to 1 [link] [source]

Is it ill-advised to just pass raw SQL queries from the server to the client that would bring the client into sync with the server?

(3.1) By cj (sqlitening) on 2024-02-17 12:35:11 edited from 3.0 in reply to 2 [link] [source]

SQLitening can log all requests to sqliteningserver.log
Any windows clients can run filewatcher to view changes in real-time.
I have never used this to mirror server database to clients. 
I suppose a broadcast in windows might also be used.  
It would also make a difference if server is intranet or internet.
Anyway, just throwing out ideas.  Interested.  (Edited grammar, too early.)

(6) By cj (sqlitening) on 2024-02-17 13:57:25 in reply to 3.1 [link] [source]

Here are some links I found.

https://kerkour.com/sqlite-for-servers

https://sqlite.news/

https://sqlite.org/cloudsqlite/doc/trunk/www/index.wiki

I am going to experiment adding a database to sqlitening service (server) that logs writes.

Might be able to keep clients in sync.

I'll be monitoring this thread.

(4) By User123 on 2024-02-17 12:48:08 in reply to 1 [link] [source]

This might be what you're looking for, convergent, replicated, sqlite.

(5) By anonymous on 2024-02-17 12:57:54 in reply to 1 [link] [source]

I tried a Goggle search on

sync sqlite dbs

Interesting results.

(7) By martin2020 on 2024-02-20 15:57:03 in reply to 1 [link] [source]

After thinking more about this, I have settled on a solution to try. If you have questions or suggestions, please share them!

Firstly, I will run the SQLite dbhash program every time the server changes its database, and make the resulting hash available via one of the server's API endpoints. That way, the client/app can quickly check if its data is out of date by fetching the server dbhash and comparing that to the client's own dbhash.

If the dbhashes on both server and client match, then the databases are in sync, and there is nothing to do.

However, if the dbhashes on the server and client do not match, then the client will be able to fetch the whole db file from the server to replace the current client db file. This could be made more efficient by only fetching the new data, but I do not anticipate the data being very big (at most a few kilobytes), and I do not expect it to change very often, so I think it should be reasonably performant to just fetch the whole file in case of an update.

Thoughts?

(8) By Gerry Snyder (GSnyder) on 2024-02-20 16:48:50 in reply to 7 [link] [source]

That sounds right to me.

Going to the work of being able to determine the new data and send just that would be premature optimization.

Gerry

(9.1) By Simon Slavin (slavin) on 2024-02-21 15:14:50 edited from 9.0 in reply to 1 [source]

Have server keep a log of the SQL commands used to change it. Basically, write each command except for SELECT and PRAGMA to a new row of a log table. When you need the client to catch up with it, download the log to the client and execute all the commands in it on the client, then delete all rows in it on the host.

You can use ATTACH to keep the log table in a separate file, to make it easy to downlaod the log table without the main database.

(10.1) By martin2020 on 2024-02-22 06:00:40 edited from 10.0 in reply to 9.1 [link] [source]

Thanks! If I do that, I suppose that I could store the latest log table rowid on the client, and then when I update from the server, fetch any entries after that, right?

(11.1) By Simon Slavin (slavin) on 2024-02-22 16:30:27 edited from 11.0 in reply to 10.1 [link] [source]

That is a good idea. Better still, whenever you sync, store the last rowid on both the client and the server. Your sync process can start by checking to see they're the same, and produce an error if they aren't. This would prevent problems if someone restored an old backup on one or other computer without realising this would mess up the update process.

This arrangement could be used (with small changes) with any situation where only one computer can make changes to a database but many computers need to have copies of it. It does not work at all where more than one computer can make changes.

One advantage of a system like this is that it's multiplatform, and involves SQLite calls but no OS calls. You don't have to worry about Windows/Unix file permissions, interpreting date formats, etc..

(12) By martin2020 on 2024-02-22 19:33:53 in reply to 11.1 [link] [source]

I'm not sure I understand your point about an old backup messing up the update process, particularly when you say, "Your sync process can start by checking to see they're the same, and produce an error if they aren't." Do you mean generating an error if the client rowid is newer than the server rowid? (Since that should never happen). That might happen if the server is restored from an old backup.

However, if I'm understanding correctly, it should be a normal case that the server rowid is newer than the client rowid, which should trigger an update of the client from the server.

Is that different than how you're thinking about it?

(13) By Simon Slavin (slavin) on 2024-02-23 12:29:00 in reply to 12 [link] [source]

You got it right. It is a check to see that nobody messed with either computer. If everything's normal, the two numbers match. If someone restored an old backup on one computer but not the other, the two won't match, and an update on the client machine will produce a database with bad data in it.

The sync process writes the same rowid -- the current maximum rowid of the log table -- to a file or table on both the client and the server. The following sync process can read them both and check to see that they're still the same. If not, it shouldn't proceed with the sync process because it will produce incorrect data.