SIGBUS errors in HHVM
Hi all. I work on HHVM, Facebook's PHP JIT compiler. We're currently on SQLite 3.30, and we've been noticing rare SIGBUS errors blaming within SQLite code. The stack trace tends to look like this:
sqlite3WalFindFrame.constprop.0 readDbPage getPageNormal moveToLeftmost sqlite3VdbeExec sqlite3_step sqlite3_exec HPHP::Facts::(anonymous namespace)::AutoloadDBImpl::analyze()
I'm simply executing a prepared
ANALYZE query statement. The DB is stored on a disk that has plenty of free space left.
I don't have a reproducible repro, but I'm mostly asking exploratory questions at this stage.
How can we prevent these SIGBUS errors in our application? I saw that, in 2012, this ticket in 2012 led to this bugfix in SQLite's code. Could this potentially be a bug in SQLite that we can help triage?
sqlite3WalFindFrame() accesses the *-shm file via an mmap() mapping. So one way this could happen is if some external program or malfunctioning SQLite library is truncating the *-shm file while it is being used.
In 2012, it looks like the SIGBUS happened because there was insufficient disk space, and Linux doesn't actually allocate the page until the first time it is accessed. So the first time SQLite accessed the page - SIGBUS. This can't happen if you use fallocate() (or a series of write() calls) to preallocate the pages.
After the crash happens, is the *-shm file left on disk zero bytes in size?
The DB is stored on a disk that has plenty of free space left.
Is it a local or network disk? File i/o via mmap can result in SIGBUS if accessing a network filesystem that is experiencing problems.
(possibly this is a reflection of the more general issue that the mmap API does not provide any way for I/O errors to be reported)
Our developers have their SQLite DBs on local disks running Btrfs. Developers do occasionally run into
SQLITE_IOERR errors when writing to disk.
I'll get back to you when I have a reliable answer to this question😅