When do readers actually stop using the WAL? Is there a contradiction?

(1) By Philip O'Toole (otoolep) on 2024-02-01 13:45:46 [link] [source]

There is something I don't understand in the otherwise clear WAL documentation (at https://www.sqlite.org/wal.html). I would appreciate any help.

To quote the docs:

"When a reader needs a page of content, it first checks the WAL to see if that page appears there, and if so it pulls in the last copy of the page that occurs in the WAL prior to the reader's end mark."

This seems to say that a reader "will unconditionally use the page of content in the WAL regardless of whether that page had been copied to the main database file (perhaps as a result of an earlier passive checkpoint that did not complete)".

Later the docs state:

"Whenever a write operation occurs, the writer checks how much progress the checkpointer has made, and if the entire WAL has been transferred into the database and synced and if no readers are making use of the WAL..."

But how can a reader "not be making use of a WAL" if the first statement is correct? Given that a page exists in both the WAL and the database file, the first statement implies that the WAL page will be used by the reader. In other word if the WAL file exists (and has not been reset so that all the WAL frames are valid with respect to the WAL file header) readers will always use the page in the WAL. Readers could never stop using the WAL.

Am I missing something? It would make sense to me if the WAL will use the page in the WAL -- but only if it hasn't been copied to the database yet. Or another way of interpreting the statement "no readers are making use of the WAL" is really just saying "there are no active readers".

(2) By Jauhar Arifin (jauhararifin) on 2024-02-01 14:02:40 in reply to 1 [link] [source]

WAL doesn't store the whole change since the beginning of the database. Eventually, the changes stored in the WAL will be applied to the main database file and the changes that already applied to the main database file will be deleted from the WAL. That's when the reader stop reading from the WAL. If you can't find page X from the WAL, it means that page already stored to the main database and the WAL entry for page X is deleted.

(4.5) By Philip O'Toole (otoolep) on 2024-02-01 15:54:08 edited from 4.4 in reply to 2 [link] [source]

Yep, I realize the WAL doesn't store the entire history.

"and the changes that already applied to the main database file will be deleted from the WAL"

I do not believe this is correct. It only happens a) logically (if there is a successful RESTART) or b) a TRUNCATE.

A PASSIVE checkpoint (which doesn't complete) copies pages to the main SQLite file, but leaves the pages intact in the WAL, and also leaves the WAL file header untouched since the WAL has not been reset. Those pages in the WAL which were copied to the database remain perfectly valid. This is easy to show in practise. Set up a SQLite database in WAL mode, do some writes, stop writing, sha265 the WAL file, execute a PASSIVE checkpoint which doesn't complete, and sha256 the WAL file again. The sha256 sums will be identical.

Perhaps the WAL index kicks in here, and actually the search for the pages in the WAL to make available to the reader are only those pages which are after the "nBackfill" value matter. See https://www.sqlite.org/walformat.html

(6) By ddevienne on 2024-02-01 16:43:11 in reply to 4.5 [link] [source]

But there's a 3rd actor in this dance, on top of the DB and its WAL. And that's the SHM, no?
Isn't the PASSIVE checkpoint updating the SHM, so that readers who can will read from the DB file,
instead of the (identical for those pages) the WAL? I.e. the snapshot ID cutoff between transactions
entirely in the DB files, and those split between the DB and the WAL, still moves on that PASSIVE checkpoint, via the SHM?

I just like to conjecture, but I don't know the real details. So take that with a grain of salt, of course.
Hopefully Dan or Richard will settle this. --DD

(3) By Gunter Hick (gunter_hick) on 2024-02-01 14:29:02 in reply to 1 [link] [source]

A reader "stops using the WAL" when it ends it's (implicit or explicit) transaction.

(5.1) By Philip O'Toole (otoolep) on 2024-02-01 15:49:37 edited from 5.0 in reply to 1 [source]

To be more precise, let me give you example of what I mean.

A SQLite database is running along, doing writes to the WAL. A PASSIVE checkpoint happens, copies most of the pages from the WAL to the SQLite database file, but can't complete because there are some pages in the WAL that are later than an active reader's "end mark" in the WAL. All fine, but note that the WAL file itself has not changed, in the sense that the pages copied to the database file are not deleted from the WAL. Now let's say all writes stop too.

Now imagine another reader comes along, and starts a read of a page that exists at the very start of the WAL. But that page also exists -- the latest version of that page -- in the main database file too, due to the earlier PASSIVE checkpoint. From where is that data returned to the reader? From the WAL or the main database file? The first statement in the docs imply it's from the WAL. So here you have a reader using the WAL, when it could have just as easily used the database file -- but that reader is now preventing a TRUNCATE checkpoint succeeding (for example). Right?

(7) By Dan Kennedy (dan) on 2024-02-01 16:47:32 in reply to 5.1 [link] [source]

Now imagine another reader comes along, and starts a read of a page that exists at the very start of the WAL. But that page also exists -- the latest version of that page -- in the main database file too, due to the earlier PASSIVE checkpoint. From where is that data returned to the reader? From the WAL or the main database file? The first statement in the docs imply it's from the WAL. So here you have a reader using the WAL, when it could have just as easily used the database file -- but that reader is now preventing a TRUNCATE checkpoint succeeding (for example). Right?

In this case the reader reads the page from the db file. But that's just an optimization - the reader locks the wal file and establishes its end mark when it opens its transaction, not when the first page is read from the wal file. It has to be this way because the new reader expects to be able to read the last few frames of the wal file (those that have not already been checkpointed). If a "PRAGMA wal_checkpoint = TRUNCATE" were allowed to proceed, they might not be available when required by the reader.

Dan.

(8) By Philip O'Toole (otoolep) on 2024-02-01 17:05:52 in reply to 7 [link] [source]

Ah yes, OK, I didn't appreciate that point -- thanks. I had this mental model that when a reader starts it knows ahead of time (after initial query processing) what pages it will need to access (and could check first if all those pages were in the main database file).

Now that I think about it, of course. By definition executing a query means reading database pages, and that read could easily involve uncheckpointed pages in the WAL.

(9) By Jauhar Arifin (jauhararifin) on 2024-02-01 20:45:48 in reply to 5.1 [link] [source]

Now imagine another reader comes along, and starts a read of a page that exists at the very start of the WAL. But that page also exists -- the latest version of that page -- in the main database file too, due to the earlier PASSIVE checkpoint. From where is that data returned to the reader? From the WAL or the main database file? The first statement in the docs imply it's from the WAL.

I see what you meant. I think what sqlite does is it read the page from the main database file, and the doc probably doesn't tell the whole story. Or, maybe the doc is ambiguous on the word "appears". If the page is applied to the main database through passive checkpointing, even though physically the page is still persisted in the WAL, maybe logically it's not counted as "appears in the WAL". Of course this is just my opinion, I didn't read it from the doc or the source.

(11) By Gunter Hick (gunter_hick) on 2024-02-02 00:12:41 in reply to 9 [link] [source]

The docs would suggest otherwise.

The WAL file is a time ordered list of changes.

A write transaction 
a) checks if the last frame has been checkpointed, and resets the WAL file if so
b) creates a new frame
c) writes changed pages to the frame and records the frame number, page number and location in the wal index structure
d) closes the frame on a commit or deletes it on a rollback

A read transaction
a) remembers the last committed frame when it begins
b) checks the wal index for the page number it wants
c) if found, retrieves the latest version of the page written before the read from the WAL file
d) if not found, retrieves the page from the database file

Any page the read transaction requires that has been written before the read transaction had begun is thus always taken from the WAL file.

A checkpoint
a) checks last frame number copied by the previous checkpoint
b) checks the frame number of the oldest reader
c) copies any frames between these two locations to the database file
d) records the last frame it copied

(10) By Gunter Hick (gunter_hick) on 2024-02-01 23:42:11 in reply to 5.1 [link] [source]

You are describing checkpoint starvation. As long as at least one reader is active (i.e. within a transaction) the checkpoint cannot complete. Each reader will retrieve the LATEST version of any modified page PRIOR to their "end mark". The oldest reader determines the LATEST version of any modified page that may be copied back to the database.

Readers use information in the WAL index (-shm file) to look up the location of the correct version (i.e. corresponding to their respective end marks) of a modified page within the WAL file. A reader will thus always read (only) the appropriate version of the page from the WAL file, irrespective of whether that is also the version already copied back to the database file or not.