Is `pragma mmap_size` still considered dangerous on macOS?

(1) By Jens Alfke (snej) on 2024-10-15 23:19:57 [link] [source]

We have an old (circa 2017) check in our database-initialization code that keeps us from using pragma mmap_size on the macOS platform. There's a comment saying "avoid possible file corruption on macOS".

None of us can remember exactly what led to this. As far as I can remember, there was some kind of a macOS kernel bug at one time that would sometimes not update the mapped page after a filesystem write to that page; and as a result of that, the SQLite team advised not to use pragma mmap_size on macOS.

Since it's been a long time, I'm circling back to check whether anyone knows whether that problem still exists. (And ideally, in what macOS release it was fixed!)

Thanks.

(2) By Richard Hipp (drh) on 2024-10-16 10:58:57 in reply to 1 [link] [source]

I'm of the opinion that you should never use mmap, because if you get an I/O error of some kind, the OS raises a signal, which SQLite is unable to catch, and so the process dies. When you are not using mmap, SQLite gets back an error code from an I/O error and is able to take remedial action, or at least compose an error message.

I don't recall a case where MacOS was having issues with mmap. But just because I don't recall it doesn't mean it didn't happen.

(3) By Jens Alfke (snej) on 2024-10-16 16:37:30 in reply to 2 [link] [source]

The only mmap-related crashes I've seen came from unexpected unmounting of the filesystem, like yanking out a USB drive or SD card. In those situations I agree one shouldn't use mmap.

If the database is on an internal volume, especially the boot volume, mmap is safer; and that's always the case on iOS for example. If you start getting I/O errors there, you’ve got worse problems than an app crashing! It’s likely that any sort of VM paging or swapping-in code is going to crash.

Have you done any benchmarking of how much mmap helps performance?

(4) By Richard Hipp (drh) on 2024-10-16 16:52:22 in reply to 3 [link] [source]

With mmap turned on SQLite actually uses more CPU cycles in user space. Though, I suppose the advantage of mmap is using fewer CPU cycles in kernel space, so much fewer as to make up for the extra user-space CPU cycles. I don't have measurements of the kernel-space advantages to mmap, because those kinds of measurements are much harder to make repeatably.

(5.1) By Scott Perry (numist) on 2024-10-18 06:25:49 edited from 5.0 in reply to 2 [source]

2017 was a busy year, but I am not aware of any SQLite database corruption issues that were exclusive to mmap (i.e. did not also affect pwrite), dating back to at least 2014.

Writes through a memory map are functionally identical to pwrite once the changes make it to the universal buffer cache (which is synchronous), so fullfsync/barrierfsync (for example) would also behave identically.

That said, there is no reason to mmap database content on Darwin platforms when SQLite already has robust paging built around pread. Overhead for popular syscalls has decreased significantly over the years and the memory copy is not measurable. Last time I checked, the performance penalty for setting PRAGMA cache_size = 0 was smaller than sqlite3_db_config(db, SQLITE_DBCONFIG_LOOKASIDE, NULL, 0, 0).

(6.1) By Scott Perry (numist) on 2024-10-18 06:14:51 edited from 6.0 in reply to 3 [link] [source]

The only mmap-related crashes I've seen came from unexpected unmounting of the filesystem, like yanking out a USB drive or SD card. In those situations I agree one shouldn't use mmap.

The primary conditions where mmap is not resilient (as in "any pointer dereference in the region will crash") on Darwin are mount revocation and truncating a file to zero bytes, the latter of which I've observed in crash reports literally thousands of times by databases on internal volumes from clients that weren't even using a PRAGMA mmap_size > 0. These crashes were mostly (99+%) in walIndexTryHdr, which interacts with the wal-index at the very beginning of a transaction. Since the wal-index must be mapped for practical reasons (it is more "an interprocess shared memory region that can be manifested by path" than an actual file), the libsqlite3.dylib that ships on Darwin platforms includes some mitigations which you may observe as SQLite returning SQLITE_IOERR_VNODE instead of crashing.

If the database is on an internal volume, especially the boot volume, mmap is safer; and that's always the case on iOS for example. If you start getting I/O errors there, you’ve got worse problems than an app crashing! It’s likely that any sort of VM paging or swapping-in code is going to crash.

I'm not sure where you heard this, but it is absolutely not true.

A counterexample that I observe with some regularity is I/O errors caused by block overcommit at the storage layer, driven by filesystem features like clones and/or snapshots, which are both very common on iOS. This condition can cause in-place writes to fail when the region's backing block is marked copy on write and there are no free blocks on the volume. This is often a transient state that can self-resolve fairly quickly (by human timescales) as the operating system responds by purging on-disk caches.

In this condition a write via userland memory operations would cause the process to crash, losing the transaction's changes and any other unsaved user data resident in memory, while pwrite would return ENOSPC to SQLite, which in turn understands how to safely recover and report the error to its client. The virtual memory system itself would encounter no faults.

Have you done any benchmarking of how much mmap helps performance?

Yes! I can only speak for Darwin, where a performance benefit is not consistently measurable using default settings. In the most extreme conditions (no page cache) it provides less of a benefit than the lookaside allocator.