SQLite Forum

Status of wal2 branch
Login

Status of wal2 branch

(1) By anonymous on 2020-07-07 13:07:22 [link] [source]

Is wal2 considered stable enough to be used in production scenarios?

It looks like the branch is kept well up-to-date with the main branch. Are there any plans to make it into an official feature?

Motivation: our database has an uninterrupted stream of small write operations with relatively strict latency constraints. We want to run in WAL mode, but WAL checkpoints exceed these latency constraints. When WAL checkpoints are moved to a background thread, as the documentation suggests, we run into the ever-growing-wal-file problem.

The wal2 branch solves exactly this problem (we can confirm it works). Other solutions like moving all writes to a background thread or buffering the incoming writes mean we lose a lot of the database's nice features.

Thanks.

(2) By Dan Kennedy (dan) on 2020-07-07 19:14:18 in reply to 1 [link] [source]

Is wal2 considered stable enough to be used in production scenarios?

I don't think so. It doesn't cause any obvious failures in SQLite's test suite. But there are very few new tests - no real effort has been made to find problems. To my knowledge nobody has used it in an actual application, only for performance testing the basic concept of using two wal files.

It looks like the branch is kept well up-to-date with the main branch. Are there any plans to make it into an official feature?

No such plans at this point.

(3) By phpstatic on 2020-11-18 13:27:08 in reply to 2 [source]

Hi Dan,

I like to report a bug find about wal2.

call sqlite3_database_file_object from xOpen get this:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x48)
    frame #0: 0x0000000100056459 tests_db_bench.exe`sqlite3_database_file_object(zName="/private/tmp/test.db-wal2") at sqlite3c.h:56776:18
   56773	    zName--;
   56774	  }
   56775	  pPager = *(Pager**)(zName - 4 - sizeof(Pager*));
-> 56776	  return pPager->fd;
   56777	}
   56778	
   56779	
Target 0: (tests_db_bench.exe) stopped.

(4) By Dan Kennedy (dan) on 2020-11-19 16:10:02 in reply to 3 [link] [source]

Thanks. Should now be fixed here:

https://sqlite.org/src/info/c2426ae8a80d61e1

Dan.

(5) By phpstatic on 2020-11-20 02:13:32 in reply to 4 [link] [source]

You are most welcome.

A small warning:

warning: assigning to 'char *' from 'const char *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
  pRet->zWalName2 = &zWalName[sqlite3Strlen30(zWalName)+1];

If I create a new database, the sqlite will try access journal first, then create journal file. After I set journal_mode = WAL2, sqlite delete + journal and create 2 WAL file.

Is there a way to let sqlite skip create journal file when I intend to use WAL ? (a flags pass to sqlite3_open_v2 to avoid call "pargama journal_mode = WAL" late)

The is VFS only SQLITE_OPEN_WAL, maybe reuse it to do the work(only for create new database).

(6) By Dan Kennedy (dan) on 2020-11-20 11:24:40 in reply to 5 [link] [source]

Thanks again. Now fixed here:

https://sqlite.org/src/info/4f5481bf291c39e2

Is there a way to let sqlite skip create journal file when I intend to use WAL ? (a flags pass to sqlite3_open_v2 to avoid call "pargama journal_mode = WAL" late)

No such flag unfortunately.

Creating and deleting the journal file would be necessary when initializing a new wal database anyhow. Otherwise you would either create race conditions with other processes trying to open the db simultaneously, or else risk leaving a corrupt file on the disk if the application or system failed at an inopportune moment while opening the file.

Dan.

(7) By Joshua Wise (wisej12) on 2021-05-17 20:52:09 in reply to 2 [link] [source]

I'd like to add my name to the list of people who are interested in this feature becoming official. It would solve many of the problems of SQLite I struggle to work around on a daily basis, when working with real-time systems.

(8) By Tyke (tyke007) on 2021-08-09 06:10:53 in reply to 2 [link] [source]

I like to know is there any plan to provide a roadmap for wal2 branch?

From the code I see, it is a runtime option to enable wal2.

Merge it into main branch and a compile time option to enable it(default disabled), will made more people use it, and more people to provide patch to improve it.

(9.2) By Gregory Petrosyan (pgregory) on 2022-06-27 12:29:42 edited from 9.1 in reply to 2 [link] [source]

Hi Dan!

I'd like to report a bug in WAL2 mode. I've got a test application that writes random key-value pairs to a simple 2-column table. Writes were batched in 60-seconds transactions, each key-value pair inserted in a separate savepoint. After a clean shutdown (in the middle of transaction -- but without a "during close" checkpoint, which I avoid by having a readonly connection open), next start I saw this:

sqlite: [26] file is not a database in "PRAGMA journal_mode=WAL2"
sqlite: [11] database corruption at line 66365 of [c8ad869938]

Line 66365 corresponds to this one in my amalgamation: https://sqlite.org/src/file?ci=wal2&name=src/wal.c&ln=4753.

Inserting printf("%d %d %x %d %d\n", walPagesize(pWal), nBuf, pWal->hdr.mxFrame2&0x7fffffff, pWal->hdr.mxFrame2, pWal->hdr.mxFrame); above shows 0 4096 0 -2147483648 425795. I can try to do some more debugging, or I can upload DB + WAL files (around 21GB combined), if that will help.

(10) By Dan Kennedy (dan) on 2022-06-27 13:56:41 in reply to 9.2 [link] [source]

Inserting printf("%d %d %x %d %dn", walPagesize(pWal), nBuf, pWal->hdr.mxFrame2&0x7fffffff, pWal->hdr.mxFrame2, pWal->hdr.mxFrame); above shows 0 4096 0 -2147483648 425795. I can try to do some more debugging, or I can upload DB + WAL files (around 21GB combined), if that will help.

I'm really not sure what this might be.

Is it hitting this line:

http://www.sqlite.org/src/artifact/db537493a3aaa?ln=3210

or one of the other "goto page1_init_failed" lines below it? And what are the contents of the 100 bytes at "page1" set to?

So opening this db file with the wal files in place is enough to reproduce the error? If you have somewhere to upload the 21G of data, I would like to see it..

Thanks,

Dan.

(11) By Gregory Petrosyan (pgregory) on 2022-06-27 15:50:23 in reply to 10 [link] [source]

I think I've found what the problem is (but am not 100% sure).

It looks like both walIndexRecover in 3.39 and new walIndexRecoverOne only set pWal->hdr.szPage when they encounter a commit frame. In case there are no commit frames in WAL, pWal->hdr.szPage would be left at 0. In debug build, that causes an assertion assert( amt>0 ) to fail in unixRead (before SQLITE_CORRUPT_BKPT I've reported) while trying to read page 1, with a call stack like this:

unixRead
sqlite3WalReadFrame
getPageNormal
btreeGetPage
lockBtree
sqlite3InitOne

Initializing pWal->hdr.szPage using szPage from the WAL header fixes the issue for me:

diff --git a/src/wal.c b/src/wal.c
index f8fb749c8..ac42607ed 100644
--- a/src/wal.c
+++ b/src/wal.c
@@ -1589,6 +1589,7 @@ static int walIndexRecoverOne(Wal *pWal, int iWal, u32 *pnCkpt, int *pbZero){
         return SQLITE_OK;
       }
       pWal->hdr.bigEndCksum = (u8)(magic&0x00000001);
+      pWal->hdr.szPage = (u16)((szPage&0xff00) | (szPage>>16));
       pWal->szPage = szPage;
   
       /* Verify that the WAL header checksum is correct */

(13) By Dan Kennedy (dan) on 2022-06-27 21:47:20 in reply to 11 [link] [source]

Thanks very much for chasing this down. Now fixed here:

https://sqlite.org/src/info/f6eafb65a43c650b

It is as you say of course - the problem comes about when the wal2 file contains a valid header that appears to follow the content in the wal file, but no valid transactions. In that case hdr.szPage is never set. hdr.nPage is a problem too in some cases.

Dan.

(14) By Gregory Petrosyan (pgregory) on 2022-06-27 21:56:38 in reply to 13 [link] [source]

Thanks a lot for quick reaction and fix! Can't this be a problem too in a released version (3.39) in regular WAL mode, when no transactions are present?

(15) By Dan Kennedy (dan) on 2022-06-28 11:11:12 in reply to 14 [link] [source]

I think regular wal mode is Ok. If there are no transactions in the only wal file, sqlite3WalFindFrame() will never return a hit and so sqlite3WalReadFrame() will not be called.

Dan.

(12) By Gregory Petrosyan (pgregory) on 2022-06-27 16:49:17 in reply to 10 [link] [source]

Here is a link to all the files: https://disk.yandex.com/d/_lJgbrer1v2Qyg (-shm one is probably overwritten with 0 szPage after a bad recovery).