Detecting database changes

(1.1) By Adam Marcus (marcua) on 2024-02-02 00:16:24 edited from 1.0 [source]

Hello! I'm working on an open source project that allows multiple users to create and host their own SQLite databases on a shared server. I'm now working on periodic backups of the databases to cold storage, and want to detect when a database has changed since the last backup to prevent repeating backups of unchanged databases.

Having read related forum posts, I'm aware of data_version, but my understanding is that it's only relevant when you are able to keep a connection to the database open. This means that if my process restarts for some reason, whatever data version I retrieve on my next connection might not be comparable and might indicate that the database changed when it didn't.

It seems like .sha3sum --schema would do what I want, with the cost of having to compute the sha3sum against a potentially large database. I'm curious if there's a less expensive option. Is there any SQLite pragma or other mechanism to detect a change to the database that doesn't require either maintaining an open connection or scanning the contents of the database? I'm thinking of something that might contain the most recently committed transaction ID, the most recently modified page/tuple ID, or something similar.

Thank you for your help and advice!

(2) By Roy Hashimoto (rhashimoto) on 2024-02-02 03:54:08 in reply to 1.1 [link] [source]

If you enable the SQLITE_DBPAGE extension then you can read the change counter from page 1. You might also want to set SQLITE_DBCONFIG_DEFENSIVE as otherwise that extension lets you write raw pages as well as read them.

(3) By Bo Lindbergh (_blgl_) on 2024-02-02 05:01:33 in reply to 2 [link] [source]

Did you miss the second paragraph of the file format doc section you linked to?

In WAL mode, changes to the database are detected using the wal-index and so the change counter is not needed. Hence, the change counter might not be incremented on each transaction in WAL mode.

(4) By Roy Hashimoto (rhashimoto) on 2024-02-02 06:19:10 in reply to 3 [link] [source]

Did you miss the second paragraph of the file format doc section you linked to?

No, I didn't. Using sqlite_dbpage() should read the current page 1. If that page is in the WAL, it should read it from the WAL. That's exactly why I suggest using SQLite to read the data instead of reading the database file directly with some operating system utility or a scripting language, i.e. because SQLite knows where the real page 1 is.

(5) By Bo Lindbergh (_blgl_) on 2024-02-02 07:59:54 in reply to 4 [link] [source]

Unfortunately, the "not needed" part means that page 1 won't be written to the WAL unless some other piece of it changes.

(7) By Roy Hashimoto (rhashimoto) on 2024-02-02 15:48:09 in reply to 5 [link] [source]

That I wasn't aware of. I stand corrected; thanks for setting me straight.

(6) By Chris Locke (chrisjlocke1) on 2024-02-02 08:58:33 in reply to 1.1 [link] [source]

I'd have a look at dbhub.io .. not that it solves your problem, but rather than reinvent the wheel, you could have a look at how they manage transactions, commits, et al.

(8) By Simon Slavin (slavin) on 2024-02-02 22:47:46 in reply to 1.1 [link] [source]

Can't be done inside SQLite. Write a script or program to look at the touch dates on the files.

(9) By Roman (moskvich412) on 2024-02-02 23:23:39 in reply to 1.1 [link] [source]

Use rsync for backup. However:

If database is copied while it is being updated, the backed-up copy will be corrupt. You need to copy when all transactions are closed.

Roman

(10) By ralf (ralfbertling) on 2024-02-05 08:21:20 in reply to 1.1 [link] [source]

Hi,

interesting project (thumbsup).

I think you need to distinguish between https://en.wikipedia.org/wiki/Necessity_and_sufficiency .

There are a number of criteria to assume a database has not changed on file level etc.

https://sqlite.org/sqldiff.html is a thing but I suspect it will scan the entire db file and judging from the other replies I assume that currently that cannot be avoided unless you make assumptions about the database itsself.

If you are willing to do that it would be possible to add triggers to run a changelog into a table by making any change inserting into the changelog table the QUOTEd value of all changed columns.

You might also want to use dumps for cold storage as in https://sqlite.org/cli.html#converting_an_entire_database_to_a_text_file

And put that into cold storage. In my experience dump files may be bigger than the sqlite database but tend to be more compressible and more friendly for version control.

Regards, ralf

(If you don't want to store text format but storage size is a priority over recovery speed, you can store all index DDL from SQLite_Schema and drop all indexes (and vacuum) - then execute the DDL when restoring the file. Any index is by definition redundant data and can be completely removed from data and definition making compression more efficient.)

(11) By Adam Marcus (marcua) on 2024-02-10 23:15:19 in reply to 1.1 [link] [source]

Thank you to everyone who replied! Based on your responses, my plan is to first check modification dates for any difference (without assuming anything about the actual timestamps to avoid the effects of clock drift and timing differences between machines) as well as a hash of the contents of the database if a modification time change is detected.