integrity_check(100) calls progress callback millions of times.
(1) By sqlighter (macsql20) on 2022-03-26 15:19:30 [link] [source]
A user reported that the diagnosis function my software performs on request suddenly takes several hours (up to 15) to complete. Normal would be something like 30 minutes.
The database file has a size of 50G and generally performs well.
After some digging, I've figured that the problem is caused by the call to integrity_check(100)
, which is called as part of the diagnosis.
Note: I don't use quick_check
because it does not check indexes, and the aim of the diagnosis is to find all possible problems.
Before calling integrity_check(100)
, my code sets a progress handler by calling sqlite3_progress_handler
.
The callback function increments a variable and every 5000th call checks a timestamp (uint64 comparison) and if more than 2 seconds have passed, checks a boolean flag to see if the user wants to abort. This has been optimized to be as quick as possible.
This logic is in place for many years and works well. Except for this database.
According to the log my software generates, integrity_check(100)
` calls the callback function 2,051,859,359,986 times - before the user gave up and cancelled it.
For a 20GB test database, the callback is called only 1,488,536 times at the runtime is less than 10 minutes!
I have seen this only once before, for a similarly big database a year ago. We never found a solution for this. The user had to create the database from scratch. Same size, same contents, integrity_check takes less than an hour now.
A 50 GB database size is huge, but not that uncommon. Also not the number of records in the database or anything. It generally performs well, except for integrity_check
, which seems to run in a tight and probably infinite loop. And this has happened after the user added more data to the database.
Interestingly, when the user deliberately deletes some data from the database (say, 5%), integrity_check runs a lot faster and completes in 30 minutes.
Can this be a bug in SQLite?. Something I can check on my side? Some fringe condition that makes integrity_check fall into an infinite loop?
(2) By sqlighter (macsql20) on 2022-03-30 11:19:46 in reply to 1 [link] [source]
Anyone with an idea?
(3) By Richard Hipp (drh) on 2022-03-30 11:35:11 in reply to 1 [link] [source]
I cannot reproduce the problem.
Do you have the database that causes the problem? Can you run experiments? If so, then try this:
Build with "./configure --enable-debug"
In the CLI, add ".eqp trace" prior to running PRAGMA integrity_check.
Watch the output to figure out where it is looping.
Some simpler things to try first, perhaps:
PRAGMA quick_check;
← no errors this way?PRAGMA integrity_check(1);
← no errors this way?
(4) By sqlighter (macsql20) on 2022-04-04 08:57:35 in reply to 3 [source]
I don't have the database here in my lab and it is 50GB in size. Maybe I can send a copy of the CLI to the user and explain him what to do. Or maybe he can somehow upload the database so I can work with it.
Thank you very much for the tips, I'll see what I can do.
(5) By sqlighter (macsql20) on 2022-04-07 16:44:40 in reply to 4 [link] [source]
The user was able to send me the database file. It has almost 52GB.
Using sqlite3.exe
(as downloaded from your web site), I can run both PRAGMA quick_check;
and PRAGMA integrity_check(100);
successfully from a Windows command line. The stats returned are
Memory Used: 2357000 (max 3956816) bytes
Number of Outstanding Allocations: 1935 (max 1957)
Number of Pcache Overflow Bytes: 2000016 (max 2004112) bytes
Largest Allocation: 1597613 bytes
Largest Pcache Allocation: 4256 bytes
Lookaside Slots Used: 122 (max 123)
Successful lookaside attempts: 896
Lookaside failures due to size: 13
Lookaside failures due to OOM: 640
Pager Heap Usage: 2084628 bytes
Page cache hits: 1600526068
Page cache misses: 95923530
Page cache writes: 0
Page cache spills: 0
Schema Heap Usage: 35904 bytes
Statement Heap/Lookaside Usage: 130032 bytes
Fullscan Steps: 0
Sort Operations: 0
Autoindex Inserts: 0
Virtual Machine Steps: -1440742944
Reprepare operations: 0
Number of times run: 1
Memory used by prepared stmt: 130032
and sqlite3.exe
reads 392.908.401.629 bytes during that step. Disk saturation (SSD) is near 100% the whole time.
I can run quick_check();
from my own code. It reads about 168 GB and completes in 540 seconds.
integrity_check(1)
or integrity_check(100)
never complete.
After reading about 800 GB of data, the disk activity drops to virtually 0, but the process proceeds to show increasing I/O counters for read in Windows Task Manager. This looks SQLite is running in a loop, reading and reading data already in the file system cache.
When I break into the debugger, I see this:
winRead(sqlite3_file * id, void * pBuf, int amt, __int64 offset) Line 44157 C
sqlite3OsRead(sqlite3_file * id, void *) Line 22870 C
readDbPage(PgHdr * pPg) Line 54447 C
getPageNormal(Pager * pPager, unsigned int pgno, PgHdr * * ppPage, int flags) Line 56990 C
sqlite3PagerGet(Pager * pPager, unsigned int) Line 57105 C
getAndInitPage(BtShared * pBt, unsigned int pgno, MemPage * * ppPage, BtCursor * pCur, int bReadOnly) Line 66260 C
moveToChild(BtCursor * pCur, unsigned int newPgno) Line 69178 C
sqlite3BtreeMovetoUnpacked(BtCursor * pCur, UnpackedRecord * pIdxKey, __int64 intKey, int biasRight, int * pRes) Line 69712 C
sqlite3VdbeExec(Vdbe * p) Line 89382 C
sqlite3Step(Vdbe * p) Line 83196 C
sqlite3_step(sqlite3_stmt * pStmt) Line 83261 C
Since the command line application runs through, I guess it is something in the settings I use with SQLite that is causing this. But I have no real ideas where to start.
In my code, I just execute PRAGMA integrity_check(100);
and then wait until it returns. This code is in place for years and so far always worked. Except for two very large databases. One a couple of months ago, and now this one.
I get the same result whether or not I use a progress callback, but with progress callback it takes even longer for quick_check
and analyze
to return. The callback is called hundreds of millions, if not billions of times.
Here are the settings I use for SQLite. Anything stupid that could impact the performance of integrity_check
so badly?
PRAGMA cache_size=50000;
PRAGMA auto_vacuum=NONE
PRAGMA temp_store=MEMORY;
PRAGMA foreign_keys=ON;
PRAGMA journal_mode=WAL;
PRAGMA checkpoint_fullfsync=0;
PRAGMA wal_autocheckpoint=20000;
I'm happy to make any changes or run tests you would consider useful.
(6) By anonymous on 2022-04-07 22:05:47 in reply to 5 [link] [source]
Maybe the difference between cli and your program is forcing the temp_store to memory?
(7) By sqlighter (macsql20) on 2022-04-08 15:34:35 in reply to 6 [link] [source]
Thanks. I will give this a try. Below are the options I use (I compile SQLite with Visual Studio 2019):
SQLITE_OMIT_DEPRECATED
SQLITE_ENABLE_FTS5=1
SQLITE_ENABLE_STAT2=1
SQLITE_THREADSAFE=1
SQLITE_ENABLE_JSON1=1
SQLITE_CORE=1
SQLITE_DEFAULT_MEMSTATUS=0
(8) By sqlighter (macsql20) on 2022-04-08 17:18:15 in reply to 7 [link] [source]
When letting it run, SQLite reads 1.3 TB of data (?) and then seems to loop infinitely in this code fragment:
#ifndef SQLITE_OMIT_PROGRESS_CALLBACK
/* Call the progress callback if it is configured and the required number
** of VDBE ops have been executed (either since this invocation of
** sqlite3VdbeExec() or since last time the progress callback was called).
** If the progress callback returns non-zero, exit the virtual machine with
** a return code SQLITE_ABORT.
*/
while( nVmStep>=nProgressLimit && db->xProgress!=0 ){
assert( db->nProgressOps!=0 );
nProgressLimit += db->nProgressOps;
if( db->xProgress(db->pProgressArg) ){
nProgressLimit = 0xffffffff;
rc = SQLITE_INTERRUPT;
goto abort_due_to_error;
}
}
#endif
with: nVmStep 4294967265 nProgressLimit 1348085984
My callback just increments a number and it has been called 2 billion times...?
(9) By sqlighter (macsql20) on 2022-04-12 11:11:17 in reply to 8 [link] [source]
Any idea what I should check in my code?
The user cannot run the diagnosis anymore and rebuilding the database will take him weeks...
(10) By anonymous on 2022-04-12 13:38:57 in reply to 8 [link] [source]
That's really interesting. Could you tell us the exact version of SQLite that you use? Also, how large is db->nProgressOps
?
Why is nVmStep
exactly 232-31? It's an unsigned 64-bit integer and it's only ever incremented by one inside sqlite3VdbeExec
. Did SQLite really execute this many VDBE operations without reaching the point where it's safe to call the progress callback enough times?
nProgressLimit
is also interesting; it's a 64-bit unsigned integer whose value is currently large but would fit in a 32-bit unsigned integer. I see that the value of nProgressLimit
may depend on p->aCounter[SQLITE_STMTSTATUS_VM_STEP]
, which is a 32-bit unsigned integer, whose value is set from nVmStep
by casting it to an (int)
.
A large nVmStep
may have overflown the int
conversion, but so what? nProgressLimit
is always initially set to a number between 0
and db->nProgressOps
using unsigned arithmetic. Even a very unlucky iPrior
would only disturb the situation by one call of db->xProgress
.
(11) By sqlighter (macsql20) on 2022-04-15 14:57:12 in reply to 10 [link] [source]
Good News! And shame on me!
It seems I have missed a couple of updates of SQLite (I was used to getting emails about updates, but apparently this does not work anymore).
Anyway, updating to the most recent version of SQLite solved the problem. Yay!
The PRAGMA integrity_check
now completes in 30 Minutes, reading approximately 2.5 TB for the 50 GB database.
Thanks for all users who provided support and advice.