SQLite Forum

Question on internals: Sync between WAL and B Tree
Login

Question on internals: Sync between WAL and B Tree

(1) By anonymous on 2021-05-31 07:32:20 [link] [source]

I am trying to understand general implementations of DBs. From what I understand, updates are first written to WAL and then applied to the B Tree.

My question is, what happens if the write to the WAL is successful, but changes to B Tree fail? Like say in case of a restart

I imagine there prolly would be some book keeping, to mark successful WAL applies. So when system restarts, it can resume from there. But what happens if this flag/check itself fails:

  1. Write to WAL succeeded

  2. Write to B Tree also succeeded

  3. Now we need to update somewhere that certain WAL record was applied. But what if this call fails?

Then on system restart, we would end up make changes to B Tree again? How does SQLite prevents this

Thank you very much in advance

(2) By Stephan Beal (stephan) on 2021-05-31 07:44:36 in reply to 1 [link] [source]

Then on system restart, we would end up make changes to B Tree again? How does SQLite prevents this

Why would sqlite need or want to prevent that? Step (2) is a logical part of step (3). We cannot persistently know that step (2) succeeded unless recording that (in step (3)) succeeds. Thus if step (3) does not complete then step (2) is not complete. Ergo, it will replay the log the next time it can. That's a feature, not a bug.

(3) By anonymous on 2021-05-31 07:47:28 in reply to 2 [source]

Then you would end up making the changes again? If I am incrementing a column, wouldn't that get incremented twice?

(4) By Stephan Beal (stephan) on 2021-05-31 08:04:12 in reply to 3 [link] [source]

Then you would end up making the changes again? If I am incrementing a column, wouldn't that get incremented twice?

When the WAL is applied to the same db N times, it overwrites/replaces the same records each time. The end effect is as if it had been applied one time, even if it's physically applied two or more times.

(5) By anonymous on 2021-05-31 08:11:34 in reply to 4 [link] [source]

Thank you! That explains some questions. But then how would the following work?

UPDATE myTable SET ID = ID + 1;
  1. First fetch the existing the value of ID (say its 23)

  2. Write to WAL that ID = 24

something like this?

(6) By Gunter Hick (gunter_hick) on 2021-05-31 09:41:23 in reply to 5 [link] [source]

WAL means Write Ahead Log. This is performed on page level, not on record level.

Any pages changed by the current write transaction are written to the WAL file. If the transaction commits, a commit record is written; if it rolls back, the entire transaction is deleted from the WAL file

If multiple transactions modify any given page, there may be several copies of that page corresponding to the states created by each of the committed transactions.

A read transaction will always continue to see the last commited state of any page as it was at the beginning of the read transaction.

Eventually, pages from committed transactions are written back to the db file. The delay also depends on read transactions still requiring outdated versions of modified pages.

(7) By anonymous on 2021-05-31 13:21:49 in reply to 6 [link] [source]

Interesting! Any reason it is at page level, but not at record level? I don't see any difference it would make plus at page level, the WAL record size would be bigger

(8) By Richard Hipp (drh) on 2021-05-31 14:09:07 in reply to 7 [link] [source]

Processes can see transactions that have been written to the WAL file but not into the database by reading the corresponding pages out of the WAL file instead of the database. This could also be done if the WAL file is record-oriented, but it would involve reading both the database page and changes in the WAL file and merging them together. That is a lot more complex and potentially slower.

(9) By anonymous on 2021-05-31 16:56:01 in reply to 8 [link] [source]

Oh! Thank you! :D

(10) By tom (younique) on 2021-06-09 14:12:53 in reply to 9 [link] [source]

I know this thread is some days old, but want to point you to https://sqlite.org/atomiccommit.html which is about traditional journal mode but explains all details very well and understandably.