Backup API with network filesystem copy

(1.1) By Larry Brasfield (larrybr) on 2021-12-14 16:24:00 edited from 1.0 [link] [source]

Recently, while pondering ways to use Fossil (which uses SQLite) as part of a fine-grained backup system, it occurred to me that the Backup API might be useful. This question arose:

Is it true, because of the way the Backup API works, that having the destination (or "copied to") DB on the far side of a network filesystem layer is safe under some or all circumstances? {I ask despite these stern warnings.} I'm thinking that, because the API (implementation) grabs the target DB file during the whole backup operation and essentially replaces it, the hazards associated with flaky (but still trying) locking by network filesystem implementations are substantially eliminated.

If it's not safe under all circumstances, what might be done to arrange that it is safe?

(2) By AlexJ (CompuRoot) on 2021-12-14 16:52:00 in reply to 1.1 [link] [source]

... as part of a fine-grained backup system

I did it a long time ago and stepped on issue that sqlite will choke on files bigger than 2Gb, so just keep it in mind. As of me I resolve it by checking first directories with find and if there're files bigger than 2Gb then throw error for review. Usually those are subject for rsync instead of fossil and need to be add to ignore list.

If it's not safe under all circumstances, what might be done to arrange that it is safe?

If it's going to be a network storage then, in my case I simply using standard fossil sync and on remote server where actual backup lives, I do cron job with new command vacuum into periodically (aka level 0 backup), this way fossil sends only changes instead of pushing the whole database.

So, just my 2 cents

(3) By Larry Brasfield (larrybr) on 2021-12-14 17:00:18 in reply to 2 [link] [source]

While I appreciate your experience and thinking on this, I still hope for an answer to my question. The reason I am tempted to use the Backup API is that it is available in the SQLite CLI shell (as the .backup meta-command), so that would enable a very simple solution.

(4) By Simon Slavin (slavin) on 2021-12-15 00:14:00 in reply to 1.1 [link] [source]

I think from your question that you are backing up from a local source to a network destination.

The reason SQLite users are told to avoid network file systems is that the locking systems are often faulty. If the backup is backing up to a network file system, and nothing is trying to open the backup target, there's no contention for access to the target, so you should be okay.

On the other hand, if something is trying to open the backup target while the backup is proceeding, you may run into the problems. To minimise problems I would delete or rename any file with the same path/filename as the target before you start the backup. Though that may not be sufficient to avoid problems.

(5) By Larry Brasfield (larrybr) on 2021-12-15 02:51:39 in reply to 4 [link] [source]

Thanks, Simon. You've described why it would usually work for my application, backing up a Fossil DB to a local semi-offsite store over the local network.

On the other hand, if something is trying to open the backup target while the backup is proceeding, you may run into the problems.

According to the Backup API doc, the "backup target" is locked for write during the whole time that the backup operation is in process, and before that operation completes and releases the lock, the whole DB file has been replaced. If not for issues with locking, (or if network filesystems worked just like local devices), this would be enough for avoiding problems (of the sternly warned against kind.)

As I see it, even with a flaky locking implementation, the prospects for a corrupt DB are greatly reduced, even if multiple clients try to access and/or backup to it. Most, if not all of that hazard, in the usual multi-client usage, relates to different writers modifying parts of the DB file without respecting the shared aspects of such modification. (which parts, recording which parts consistently, and overwrite of shared data structures) Because of the whole-DB replacement effect of the Backup API, those problems nearly vanish.

I'm pretty sure that once the back-upper can begin its page writing it will be able to complete the whole-DB replacement before releasing its write lock. And if some other ill-conceived clients^a see a much different DB immediately after that lock is released, that is a higher category of problem than is solved by making locks work as advertised/desired.

a. It is ill-conceived to have DB snap-shots (created via the Backup API) also being used for ordinary row or field at a time modifications. The Backup API is not a replication scheme; it's a means of getting a source DB to a previous state when something goes wrong with its store or its own clients.

(6) By Bill Wade (billwade) on 2021-12-15 14:04:01 in reply to 4 [link] [source]

In general the steps seem to be

1) Make a copy of the source database (backup)
2) Get the copy to the destination machine
3) On the destination machine, validate the copy
4) On the destination, make the copy available to client programs

Depending on how much you trust your environment, you might decide to merge some of those steps together. Perhaps you are confident enough to combine all four into a single step.

With a "pretty good" environment I'd likely go with
a) Copy to a temporary file name on the destination machine.
b) Rename temporary file name to the final file name.

For poorer networks, I'd go with the listed steps (and perhaps add compression/decompression as part of (2)).

(7) By Rowan Worth (sqweek) on 2021-12-16 05:28:43 in reply to 5 [source]

Note that the backup API may have to restart the process if the source database changes -- unless you ask it to copy all pages in a single call in which case it will hold a read-lock on the source DB for the duration of the backup.

This gets more relevant if the destination is elsewhere on the network, as the pager will still be writing one page at a time which is unlikely to be an optimal blocksize for non-local latencies. ie. In general I would expect .backup to a local disk followed by a typical network transfer to be competitive if not faster than a straight .backup to the network (and certainly it would mean a shorter read-lock on the source DB).