Most efficient way to tell if any database content has changed
(1) By Simon Willison (simonw) on 2020-10-12 18:54:32 updated by 1.1 [link] [source]
I'm interested in answering the question "has the contents of this database file changed in any way" as efficiently as possible. I want to do this for caching purposes - I'd like to be able to cache various things relating to an infrequently changed database but invalidate that cache if even a single row has been inserted or updated. I've successfully used `PRAGMA schema_version` to invalidate a cache when the database schema changes, but now I'm looking to do it based on the file contents instead. Options I'm considering: - Calculating the MD5 hash of the file contents on disk, which runs pretty fast (but not fast enough to do it often) - Adding triggers to every single database table which increment a counter any time those tables have content inserted/updated/deleted I'm pretty sure that second option would work correctly (I'm not at all worried about the additional overhead on writes), but I'm wondering if there's a solution I haven't considered that would be even better.
Most efficient way to tell if any database content has changed
(1.1) By Simon Willison (simonw) on 2020-10-12 18:55:07 edited from 1.0 updated by 1.2 [link] [source]
I'm interested in answering the question "has the contents of this database changed in any way" as efficiently as possible. I want to do this for caching purposes - I'd like to be able to cache various things relating to an infrequently changed database but invalidate that cache if even a single row has been inserted or updated. I've successfully used `PRAGMA schema_version` to invalidate a cache when the database schema changes, but now I'm looking to do it based on the file contents instead. Options I'm considering: - Calculating the MD5 hash of the file contents on disk, which runs pretty fast (but not fast enough to do it often) - Adding triggers to every single database table which increment a counter any time those tables have content inserted/updated/deleted I'm pretty sure that second option would work correctly (I'm not at all worried about the additional overhead on writes), but I'm wondering if there's a solution I haven't considered that would be even better.
Most efficient way to tell if any database content has changed
(1.2) By Simon Willison (simonw) on 2020-10-12 18:55:36 edited from 1.1 [source]
I'm interested in answering the question "has the contents of this database changed in any way" as efficiently as possible.
I want to do this for caching purposes - I'd like to be able to cache various things relating to an infrequently changed database but invalidate that cache if even a single row has been inserted or updated.
I've successfully used PRAGMA schema_version
to invalidate a cache when the database schema changes, but now I'm looking to do it based on the file contents instead.
Options I'm considering:
- Calculating the MD5 hash of the file contents on disk, which runs pretty fast (but not fast enough to do it often)
- Adding triggers to every single database table which increment a counter in a dedicated
content_version
table any time those tables have content inserted/updated/deleted
I'm pretty sure that second option would work correctly (I'm not at all worried about the additional overhead on writes), but I'm wondering if there's a solution I haven't considered that would be even better.
(2) By Keith Medcalf (kmedcalf) on 2020-10-12 19:09:03 in reply to 1.2 updated by 2.1 [link] [source]
The simplest way is probably to hold-open a second connection to the database and periodically execute `PRAGMA data_version`. If the value changes then *a different connection* has updated the data in the database. Since this connection is used for nothing other than polling the data_version, you only need a very_small cache_size for it. <https://sqlite.org/pragma.html#pragma_data_version>
(2.1) By Keith Medcalf (kmedcalf) on 2020-10-12 19:13:00 edited from 2.0 in reply to 1.2 [link] [source]
The simplest way is probably to hold-open a second connection to the database and periodically execute PRAGMA data_version
. If the value changes then a different connection has updated the data in the database. Since this connection is used for nothing other than polling the data_version, you only need a very small cache_size for it.
https://sqlite.org/pragma.html#pragma_data_version
Note that this is basically how a connection determines internally that the database file has been modified by another connection and must flush its page cache.
(3) By Simon Willison (simonw) on 2020-10-12 22:17:38 in reply to 2.1 [link] [source]
Not sure how I missed that, that looks perfect! Thanks very much.
(4) By oneeyeman on 2020-10-12 22:21:14 in reply to 3 [link] [source]
Hi,
And you can just run it in a secondary thread which shouldn't give any overhead on the main application...
Thank you.
(5) By Kevin Youren (KevinYouren) on 2020-10-12 22:28:14 in reply to 1.2 [link] [source]
Would a simple directory listing suffice?
It is independent of file size.
I mainly use Unix terminals, so:
ls -lai cfd_20201009_0146.sqlite
gives: 6553669 -rw-r--r-- 1 root root 200912896 Oct 9 12:53 cfd_20201009_0146.sqlite
Windows has similar and Android too. (with a bit of effort)