wal checkpointing very slow

(1) By anonymous on 2021-08-15 09:03:24 [link] [source]

I'm using sqlite in wal mode and when I run benchmarks on my program, I see that sqlite's autocheckpoints are by far the biggest bottleneck. The database is < 10MB and the wal file is 34MB.

When I disable autocheckpoints, I see that the wal size very quickly (within a few seconds) goes up and above 1 GB.

Why is the wal growing so fast but the database staying small? I don't delete data often and certainly not on a magnitude of gigabytes.

My program does many small insert transactions (it's not possible to batch them).

(2) By Simon Slavin (slavin) on 2021-08-15 13:10:45 in reply to 1 [source]

[the following is oversimplified to make it easy to understand]

In WAL mode, until you reach a checkpoint, SQLite doesn't update the database. All changes are logged in the WAL file. When a checkpoint is processed, SQLite has to go through the log of changes, figure out which ones haven't been obsoleted by later changes, and make appropriate changes to the database file. Then it can start again with a new blank WAL file.

That's why the checkpoints are the bottleneck: really SQLite doesn't do much writing between checkpoints.

In your situation of many small unpredictable changes, it may be that WAL mode is not the best mode to use. You might be better off with

PRAGMA journal_mode = TRUNCATE

Perhaps you could try it.

(3) By Keith Medcalf (kmedcalf) on 2021-08-15 18:04:56 in reply to 2 [link] [source]

In WAL mode, until you reach a checkpoint, SQLite doesn't update the database. All changes are logged in the WAL file. When a checkpoint is processed, SQLite has to go through the log of changes, figure out which ones haven't been obsoleted by later changes, and make appropriate changes to the database file. Then it can start again with a new blank WAL file.

Partially correct.

When write-ahead logging is in effect, pages changed by each transaction are written to the write-ahead log. This means that even though a "transaction" may only affect one itty bitty row, it may impact multiple database pages (the page(s) containing the data and any pages containing btree or index data related to the table). This means that a "small update to one row" may result in "a shitload of changed pages". Each shitload is a separate set of changes (transaction). So a "whole bunch of small transactions" may result in "a whole bunch of shitloads of pages being modified and written to the write-ahead log".

Eventually, a wal checkpoint will occur which will trigger the copying of pages from the wal to the database file.

The first step is to find out the "transaction boundary" up to which changed pages may be written to the database. Lets call this point B, and the start of the wal file as A.

SQLite3 has now determined that all the changed pages between point A and point B can be written to the main database file.

There are two ways this could be done.

Method 1:

collect up the list of all changed pages by page number
find the latest version of each page in the list
copy that latest version of the page to the main database file
mark all the transactions between A and B as checkpointed

Method 2:

starting at point A copy each page changed one after each to the main database file
mark each transaction in the wal file as checkpointed when all its pages are copied
stop copying when you reach point B

Whether method 1 or method 2 is used is an implementation detail. Although I have not examined the code to see what method is being used in what circumstance, it really does not matter.

The root issue is that your "small inserts" are each affecting a large number of pages and all of these pages must be tracked.

For example, a "1 byte inserted record" affects (1) the page on which the record is stored, (2) any btree pages used to locate the record in the table and (3) index btree pages.

When you insert one record into a table, there is AT LEAST one page changed to contain the data, and AT LEAST one page changed for each index associated with the table. So if you have a standard table with 3 indexes, then each record insert (which may be only a few bytes) will cause at least 16384 bytes (4 pages) to be written to the WAL file for each transaction.

If you do 100 of these transactions, then at least 400 pages (1638400 bytes) are written to the WAL file.

However, these could, in theory, be the same set of 4 pages repeated 100 times (once for each transaction) so when "checkpointed" the 2 MB WAL file might coalesce to a 16384 byte database file.