SQLite User Forum

Running .dump on active DB in WAL mode removes the -wal file
Login

Running .dump on active DB in WAL mode removes the -wal file

(1) By anonymous on 2025-04-29 01:37:08 [link] [source]

I am quite confused at what is happening in the following scenario:

  • Database is started and left running in WAL mode, there are constant writes from a C program, the -wal and -shm files are created and WAL can be observed growing till it reaches ~4MB (after which - assuming - it keeps rolling within)

  • In a shell script, the database is dumped with the use of sqlite3 .dump which succeeds

  • From the exact point of the dump, both -wal and -shm files disappear, but there is no other issue observed, writes continue successfully, dumps can be performed again.

  • Exactly same situation occurs when .backup is used on the DB

What am I missing?

(2) By anonymous on 2025-04-29 05:16:08 in reply to 1 [link] [source]

I continued investigating this in the meantime, I know what is happening but not WHY it is happening.

Having observed the actual .db file getting touched about at the same intervals as before, I checked PRAGMA journal_mode; which returned wal despite missing -wal and -shm files.

I then restarted the whole setup, i.e. same C program opening the DB, the -wal and -shm files showed up. I created a hardlink to the -wal file, peformed .dump, both files disappeared, but I could observe on the hardlinked WAL that is is continuously being written to.

So the -wal and -shm simply get unlinked, I wonder why.

I also proceeded to try some other operations from the shell script, e.g. VACUUM INTO and sure enough, the -wal and -shm get unlinked.

This is on Debian 12:

sqlite3 --version
3.40.1 2022-12-28 14:03:47 df5c253c0b3dd24916e4ec7cf77d3db5294cc9fd45ae7b9c5e82ad8197f3alt1

The C program simply:

sqlite3_open_v2(filename, &db, SQLITE_OPEN_READWRITE|SQLITE_OPEN_CREATE, NULL);
sqlite3_exec(db, "PRAGMA journal_mode=WAL;", NULL, NULL, NULL);
sqlite3_exec(db, "PRAGMA synchronous=NORMAL",  NULL, NULL, NULL);

But I doubt it's related to the C program.

(3) By Gunter Hick (gunter_hick) on 2025-04-29 05:26:53 in reply to 1 [link] [source]

The fact that you can still perform writes after the wal file reaches 4MB would suggest that checkpoints are being performed.

That you can no longer see the wal and shm file suggests that they may have been deleted but still stick around until the last file handle is closed. 'Nix-like-OS should provide and lsof command to show deleted open files.

Better answers might depend on specifying which release of SQLite you are running and how you built it, along with the OS you are using.

(4) By Stephan Beal (stephan) on 2025-04-29 08:55:38 in reply to 2 [link] [source]

I created a hardlink to ...

Section 2.5 of www:/howtocorrupt.html says, in part:

If a single database file has multiple links (either hard or soft links) then that is just another way of saying that the file has multiple names. If two or more processes open the database using different names, then they will use different rollback journals and WAL files. ... In other words, opening and using a database file that has two or more names results in behavior that is undefined and probably undesirable.

(5) By Spindrift (spindrift) on 2025-04-29 12:15:57 in reply to 4 [link] [source]

The poster wasn't hard linking the database file though, just keeping a file system handle onto the wal file to be able to check if it was still being written to despite the original name being unlinked (which it appeared to be).

This wasn't the cause of the original behaviour, just a step in attempting to diagnose what was occuring (assuming I am reading their postings correctly).

(6) By anonymous on 2025-04-29 14:54:09 in reply to 5 [source]

That's exactly correct. I did not think of using lsof, I simply created the hardlink to be able to see if the -wal is being written to (after it appeared deleted). I never touched the hardlink (to the -wal), I was giving the tools only the .db file to work on. There's no way they knew of the hardlink.

(7) By anonymous on 2025-04-29 16:47:34 in reply to 3 [link] [source]

Thanks for your reply, so yes, it was indeed deleted after the sqlite3 run from the shell.

This was on most recent version as packaged for Debian 12 (see above), but in fact, I just compiled:

⁦/usr/local/bin/sqlite3 --version
3.49.1 2025-02-18 13:38:58 873d4e274b4988d260ba8354a9718324a1c26187a4ab4c1cc0227c03d0f10e70 (64-bit)⁦

And it's the same behaviour.

(8) By Cliff (redwoodcliff) on 2025-04-29 16:57:47 in reply to 3 [link] [source]

Hello, OP here again (just registered), so I built the latest version on Debian 12 from:

https://www.sqlite.org/2025/sqlite-autoconf-3490100.tar.gz

./configure
make install

Exactly same behaviour (-wal and -shm files get unlinked after running operation with sqlite3 from shell, but continue to hang around for the C program process).

(I do not know in which order my posts appear, 2 anonymous are now waiting moderator approval - but the gist is that you were correct and it also happens on latest version.)

(9) By Simon Slavin (slavin) on 2025-04-30 14:43:53 in reply to 1 [link] [source]

I admire your detailed observations. But I'm not sure why you posted.

Is there documentation that those files never gets deleted ? They are temporary files created and maintained by SQLite for its own use. Nothing else is supposed to use them. Are you saying that SQLite is violating documented behaviour ?

(10.1) By Cliff (redwoodcliff) on 2025-04-30 18:30:27 edited from 10.0 in reply to 9 [link] [source]

Thanks for your reply. Are you saying it's expected behavior? I am fine with that (although would be curious to e.g. know when they get deleted), I just did not think they would be ever getting deleted (unless perhaps last connection is closed).

It felt quite unnatural to have SQLite be relying on an unlinkned file. I am more concerned about whether it can cause any trouble. I just never observed this before. Perhaps if it was documented as normal, I would feel more at ease. :)

EDIT: One of the reasons I noticed this was that I was going to observe how often the block layer is receiving writes. I cannot do that if the -wal is gone. I could hardlink it to observe what's happening, but then I do not know when the DB might stop using the unlinked -wal and e.g. create a new one. I would like to know how I can be checking the actively writing DB.

(11) By Alan L (ludlovian) on 2025-04-30 21:17:16 in reply to 10.1 [link] [source]

Can you reproduce it without your C program doing the writes - e.g with the CLI doing those?

It doesn't sound right - I've never seen the WAL disappear when there's an open connection, and a scan of the source code suggests the WAL is only unlinked when the pager shuts down (ie when the last connection closes).

And if it is not right, then the error is either in your C program, the SQLite CLI or the SQLite library. So to determine which, try to replicate it by doing the writes another way.

(12.2) By Cliff (redwoodcliff) on 2025-05-01 00:26:15 edited from 12.1 in reply to 11 [link] [source]

It appears it's not happening with the CLI. I tried the following:

declare -i chunk=1000000
declare -i start=$(( $1 * chunk ))
declare -i end=$(( start + chunk -1 ))

sqlite3 test.db "CREATE TABLE IF NOT EXISTS t(x INTEGER PRIMARY KEY, y, z);"
sqlite3 test.db "PRAGMA journal_mode=WAL;"

(

printf "PRAGMA synchronous=NORMAL;"

for i in $(seq $start $end)
do

printf "BEGIN TRANSACTION;"
printf "INSERT INTO t VALUES ('%s', '%s', '%s');" $i "dummy" "dummy"
printf "COMMIT;"

done

) | sqlite3 test.db

Then I ran this twice into the background doing 2 chunks concurrently:

./sqlite-test.sh 1 &
./sqlite-test.sh 2 &

I wait a while (obviously) and can observe the -wal growing with:

watch -n1 ls -la --time-style=full-iso

NOTE: Interestingly, before I was running it concurrently, I basically would not see the -wal growing, it would just appear and the growing was the .db file.

When I run .dump on the same DB from a separate shell, the {-wal,-shm} are still around.

I do not think I am doing anything different than the C program (not mine, but I have the sources). I can only think of creating another test case, also in C. But I just cannot think of anything in that respect which could be causing it.

EDIT: The C code is running multi-threaded, but it really does not do anything than some INSERTs, UPDATEs and DELETEs, grouped into small transactions. If I write another C code with the same library and cannot reproduce this, I will be still at a loss - there won't be much of a difference I could find between the two pieces of code.

EDIT2: I am getting closer to nailing this down the C program or the library. I just tried running SELECT from Python script while the C program is running and sure enough, WAL disappears.

(13) By TripeHound on 2025-05-01 05:49:35 in reply to 12.2 [link] [source]

Thinking aloud: what, if any, Sqlite compilation options are used when compiling the C code that triggers the problem? Does it issue any PRAGMA commands (or C equivalents)? Could it be setting "no WAL mode"?

(14) By Cliff (redwoodcliff) on 2025-05-01 07:43:22 in reply to 13 [link] [source]

There's no special compiler options for the C program, I will have to make some minimalist C test case and build with the same library. But I am still thinking if I was reproducing the same scenario with the Bash script above - the C code is meant to be running multi-threaded (and it definitely does) and there is bursts of writes, on and off.

The connections are not closed after each transaction, but inbetween the bursts. I will have to check when they are closed and if properly - but if they were not, it would be more of an opposite issue (WAL never disappearing) that what I get now. The only thing I can think of which is very obviously different in C is that it is using prepared statements.

But what still got me thinking from the Bash script above - why did I see essentially stagnant WAL (present, but not growing) when I ran it only in a single job.

(15) By Alan L (ludlovian) on 2025-05-01 13:47:30 in reply to 12.2 [link] [source]

NOTE: Interestingly, before I was running it concurrently, I basically would not see the -wal growing, it would just appear and the growing was the .db file.

That's what you'd expect. When there's only one process, the autocheckpoint kicks in at 1000 WAL pages (assuming you've not changed the default), at which point the WAL will be synced - that's the point it suddenly appears in the filesystem in my replication of your test - and the pages are succesfully written back to the database, leaving the file to be reused for the next 1000 pages and so on. So it stays at 4MB whilt the main database grows.

When I run two processes, the autocheckpoint process run by one process usually finds itself unable to carry out the PASSIVE checkpoint because of the write lock from the other process, so the WAL continues to grow with no checkpointing back into the database file.

But none of that explains why the WAL would disappear whilst open. It really does sound like SQLite believes the last connection has been closed whilst the C program doesn't.

Things to consider, if you haven't already:

(16) By TripeHound on 2025-05-01 14:39:26 in reply to 15 [link] [source]

Another "gotcha" is if the C code's multi-threading is being achieved through the use of fork(), and you're opening the database before the fork and then trying to use it after the fork. See 2.6. Carrying an open database connection across a fork() on the How To Corrupt An SQLite Database File page.

Another possibility to check, given:

The [multi-threaded] connections are not closed after each transaction, but inbetween the bursts.

is 2.2. Posix advisory locks canceled by a separate thread doing close()

(In fact, on the off-chance that you've not already done so, it's probably worth checking for anything mentioned on that page).

(17) By Cliff (redwoodcliff) on 2025-05-01 20:47:54 in reply to 15 [link] [source]

the C application being linked with multiple (incompatible) copies of the SQLite library

I went after the low-hanging fruit. Running ldd on the C binary reveals:

libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0

It's even the same as for the sqlite3 binary.

connections or statements being used in more than one thread

What do you mean here, though?

(18) By Cliff (redwoodcliff) on 2025-05-01 20:51:17 in reply to 16 [link] [source]

Thanks, this is interesting. Let me write my own C test case before I start combing through the foreign C code. But I might have to.

I still however am skeptical as the DB is not getting corrupt (not even as the WAL unlinks when doing separate transaction).

(19) By Alan L (ludlovian) on 2025-05-01 21:17:37 in reply to 17 [link] [source]

connections or statements being used in more than one thread

What do you mean here, though?

See https://www.sqlite.org/faq.html#q6.

Threads are evil. You can probably get away with each thread having its own connection & prepared statements. Trying to share them across thread or fork() boundaries (as you would for file I/O) is likely to disappoint. In subtle and hard-to-test ways.

(20) By Cliff (redwoodcliff) on 2025-05-01 22:30:26 in reply to 19 [link] [source]

Thanks, I am getting it set up with the C now, I so far figured that running even just SELECT from another C code built off the same library also unlinks the WAL while the original C program is running.

I now have to make a minimalist test case to try to reproduce it with my own C main writing program. I am basically down to checking what's wrong with the original (not mine) C code.

What I still can't comprehend is - how come this is not happening (unlinking WAL randomly) without my external queries if that original C code is to blame somehow. I hope I can reproduce it at this stage, so to say

(21) By Cliff (redwoodcliff) on 2025-05-01 23:38:54 in reply to 16 [link] [source]

"gotcha" is if the C code's multi-threading is being achieved through the use of fork()

No, it uses g_thread_new as far as I can tell.

2.2. Posix advisory locks canceled by a separate thread doing close()

Nothing there really accesses the DB file directly, all just going through the library.

(22) By Cliff (redwoodcliff) on 2025-05-05 08:03:14 in reply to 16 [link] [source]

I just want to thank you very much!

Another "gotcha" is if the C code's multi-threading is being achieved through the use of fork()

This indeed was the cause!

So for anyone experiencing the same (losing WAL), there's the answer (what can cause that).

(23.2) By Cliff (redwoodcliff) on 2025-05-05 08:12:58 edited from 23.1 in reply to 1 [link] [source]

Thank you everyone who pitched in, it was caused by the original C code - see comment 22 of mine above.


2.6. Carrying an open database connection across a fork()

Do not open an SQLite database connection, then fork(), then try to use that database connection in the child process. All kinds of locking problems will result and you can easily end up with a corrupt database. SQLite is not designed to support that kind of behavior. Any database connection that is used in a child process must be opened in the child process, not inherited from the parent.

Do not even call sqlite3_close() on a database connection from a child process if the connection was opened in the parent. It is safe to close the underlying file descriptor, but the sqlite3_close() interface might invoke cleanup activities that will delete content out from under the parent, leading to errors and perhaps even database corruption.

https://www.sqlite.org/howtocorrupt.html