SIGBUS in sqlite
(1) By liziqi on 2021-08-18 04:00:33 [link]
Hi all. We're currently on SQLite 3.28, and we've been noticing rare SIGBUS errors blaming within SQLite code. The stack trace tends to look like this: memset walIndexAppend walIndexReadHdr walTryBeginRead sqlite3PagerSharedLock sqlite3BtreeBeginTrans sqlite3VdbeExec sqlite3_step In our case, we manipulate the same db, which located in android external storage, whithin multiple processes. And the external storage has plenty of free space left. After I read the source code, I think it maybe a concurrent problem. the case I guess is below: processs A mmap 32kb memory to *-shm file. process B close the db, leads to all advisory locks within db file are released. process C open db and truncate *-shm file because all locks are released. Meanwhile, process A access memmory which correspond to *-shm file. process A Crash. In our case, process run normally before access 4096-th byte and crash when access 4096-th byte. So the first page(4k) in mmap is valid. And sqlite would truncate *-shm file to 3 bytes. This make us skeptical. Could the method using sqlite be correct that multiple process manipulate the same db, such as process A, B open db, process B close, process C open...? Could potentially the case above we guess occur ? Could this potentially be a bug in SQLite that we can help triage? Thank you! Liziqi
(2) By Gunter Hick (gunter_hick) on 2021-08-18 05:58:18 in reply to 1 [link]
Not familiar with android. On linux, the documented "broken POSIX advisory lock" scenario relates to events within a single process. If two file handles point to the same file, closing one will release all locks on the file held by the process, regardless of which file handle was used to obtain them and which thread issued the request. Maybe your "processes" are really threads? Or maybe you are using a separate file handle to "check on the shm file" for debugging purposes in process A?
(3) By liziqi on 2021-08-18 06:26:53 in reply to 2 [link]
You are right.I means that "processes" are threads. On Linux, If "process" above is thread, Could potentially the case above we guess occur ?
(4) By Gunter Hick (gunter_hick) on 2021-08-18 07:05:59 in reply to 3 [link]
This is exactly what one would expect if one is bypassing the SQLite library to access an SQLite db file directly. SQLite Linux drivers know about this and take care to only ever use one file descriptor, even for multiple connections, and resolve symbolic links to the canonical file name.
(5) By Dan Kennedy (dan) on 2021-08-18 11:16:37 in reply to 3 [link]
I think the post you're answering quite reasonably suspects this problem: (https://www.sqlite.org/howtocorrupt.html#_posix_advisory_locks_canceled_by_a_separate_thread_doing_close_) Are you accessing any files directly, bypassing the SQLite library? Or is it possible that there are two copies of the SQLite library linked into the application and being used to access this database?
(6) By liziqi on 2021-08-19 11:42:38 in reply to 5
Yes, Only using java File interface to delete db files, bypassing the SQLite library. In linux, it would not affect sqlite to delete db file. In the application, only one copy of the SQLite library.
(7) By Dan Kennedy (dan) on 2021-08-20 10:51:10 in reply to 6 [link]
It might if you delete a db file while SQLite has it open. Something like: 1. Open handle A, 2. Read from handle A, 3. Delete db file with java interface, 4. Open handle B, 5. Read from handle B, 6. Close handle A. 7. Read from handle B. Will produce the crash you're seeing on Linux in the final step. There are probably other recipes.
(8) By WaterWu on 2021-09-22 12:18:59 in reply to 1 [link]
Hi ziqi we also encountered this kind of problem。Are you using the room? memset（libc） walIndexAppend pagerWalFrames pagerStress getPageNormal freePage2 clearCell sqlite3BtreeDelete sqlite3VdbeExec sqlite3_step
(9) By liziqi on 2021-09-26 07:05:32 in reply to 8 [link]
Yes, still use this room. In our actual scenario, this situation has only occurred once. Under normal circumstances, this happens only if the index file is illegally truncated. So we have checked the code. There is a way of directly deleting the db, wal, shm files in our code, followed by the reconstruction process, so it is suspected to be introduced in this place, but there is no direct evidence. The problem is suspended because it rarely occur.
(10) By Simon Slavin (slavin) on 2021-09-26 13:19:38 in reply to 9 [link]
This worries me, because it suggests you can delete these files while the database is in use. If SQLite closes the last connection to a database in WAL mode, it deletes the .shm file. So if you see a .shm file on disk SQLite thinks that one or more connection is still using the database, so you shouldn't delete any of the above files. If you think a previous run of the database has crashed, and these flies really aren't being used, you can check this by using <pre>sqlite_open() "PRAGMA user_version" (or any other operation which requires reading the file) sqlite_close()</pre> on the database. If no other connections are using the database then this will delete the .shm file.
(11) By liziqi on 2021-10-14 03:18:28 in reply to 8 [link]
Hi WaterWu： Has the problem been solved ? What is the reason ?