SQLite Forum

Huge WAL file, but not for the usual reasons

Huge WAL file, but not for the usual reasons

(1) By relatively_random on 2020-11-19 14:20:36 [source]


This is my first post here, but I wonder if some veterans may have encountered a similar issue or at least have a hunch on what could be wrong. I've been using SQLite for a few years now, but lately I just use .Net and EF Core instead of writing my own SQL.

A few weeks ago, I've encountered two machines with huge WAL files of ~3 GB. (The databases themselves are ~7 GB.) Normally, those apps maintain their WALs at few kilobytes at most. Initially, I suspected I messed something up by making some transactions too big (top-level deletes with many dependents), but last week it happened again on one machine, even though I'm not doing those big transactions anymore.

I know it's normal for WAL to grow as long as there's constantly a connection open, but even closing the application doesn't get rid of it or the shared memory file. Even restarting the machine, without ever starting the application, and attempting to open the database with DB Browser for SQLite doesn't help. It takes ages to launch (I don't know how long exactly, but definitely over 5 minutes) and then WAL and shared memory files still remain.

Bizarrely, here's what did help. I zipped up the three files to investigate locally, but then on a hunch I deleted the originals and extracted the files back again. Launching the app took seconds and the extra files were gone just like that. The same thing happened on my laptop with the unzipped files. I ran the integrity check on my machine and the database seems to be fine.

Here's a rough description of the app, if it means anything.

  • EF Core 2.1.14 (it uses SQLite 3.28.0)
  • .Net Framework 4.7.1
  • except for some rare short-lived overlap with read-only access, only one dedicated thread accesses the database
  • individual connections (DbContexts) are never used concurrently
  • the dedicated database thread closes the connection roughly every 2 seconds max
  • shared memory is used with PRAGMA mmap_size = 268435456 (256 MiB)
  • synchronous mode is NORMAL
  • the OS is Windows 7

Kind regards,

(2) By relatively_random on 2020-12-28 08:39:49 in reply to 1 [link] [source]

I managed to figure it out so here's my report.

I temporarily switched to journal mode, which helpfully reported errors instead of silently failing.

The error I was having was SQLITE_IOERR_WRITE. After more digging, it turned out the root Windows error causing all this was 665 - hitting into an NTFS file system limitation. An answer on Stack Overflow then led me to the actual cause: extreme file fragmentation of the database.

Copying the file reduces fragmentation, which is why the bizarre fix I mentioned, copying the file, temporarily worked. The actual fix was to schedule defragmentation using Sysinternals' Contig.

It would probably be helpful if WAL mode also returned an error instead of silently failing to checkpoint, to be honest. But I'm not complaining. I'm sure there are reasons why it works the way it does.