SQLite Forum

Use SQLITE_DBPAGE to backup
Login

Use SQLITE_DBPAGE to backup

(1) By Emile Fugulin (sytten) on 2023-08-01 21:14:24 [source]

Hello!

I am working on a backup system for our application. We have a bunch of sqlite databases and we want to compress them into a zip for the backup without having to first copy them to another database (since they can be multi-gig). So we can't use the backup API.

I was reading the thread about Hot backups by copy that appeared to be similar to our need, but in our case we only have one process and backups are made with the same process that can access the database.

In theory the other connections will be closed for the backup, but it is hard to ensure when using connection pools in a large application. So my understanding is that POSIX locks will be an issue if we open/close a new file handle to the files we want to backup.

I found the SQLITE_DBPAGE extension that I think we could use, basically I was thinking:

  1. Open connection
  2. PRAGMA journal_mode=WAL
  3. PRAGMA wal_checkpoint(TRUNCATE)
  4. BEGIN IMMEDIATE
  5. SELECT pgno, data FROM SQLITE_DBPAGE LIMIT ? OFFSET ?
  6. Write the pages to my zip file
  7. Loop to 5 until we read all pages

Am I missing something? I did some research but nobody seemed to have thought about using SQLITE_DBPAGE for backup?

Thanks!

(2) By anonymous on 2023-08-01 22:56:37 in reply to 1 [link] [source]

  1. No checkpointing is needed. sqlite_dbpage will get pages from the database or the WAL as appropriate.
  2. This is a read transaction, so BEGIN DEFERRED is enough.
  3. LIMIT/OFFSET is inefficient. Fetch all pages in one go.

(3) By Emile Fugulin (sytten) on 2023-08-02 01:31:12 in reply to 2 [link] [source]

Thanks for the input

  1. Fair, thats a good idea
  2. What is the behaviour if a write is done between two reads? I feel I do need to prevent ALL writes for the full duration of the backup to prevent corruption. I am copying batch of pages (probably 25 to 50) into the zip output.
  3. This is literally impossible for us because our databases don't fit in memory, I want the backup to be cancellable and not consume too much memory. I expect the offset to have some kind of optimization on this virtual table, but I might be wrong. I can use the pgno as a cursor pagination instead, I will look into the C code tomorrow.

(4) By anonymous on 2023-08-02 07:13:46 in reply to 3 [link] [source]

You're inside a transaction. Writes made through other connections won't be visible until you COMMIT or ROLLBACK.

SQLite buffers one result row at a time. Just compress and write out the data as you receive it and you won't run out of memory.

(5) By Emile Fugulin (sytten) on 2023-08-02 14:05:12 in reply to 4 [link] [source]

Thats neat, thanks for the explanation! I checked with the ORM we are using (Diesel.rs) and it does support streaming so I will give it a go.