Freeze/Hang/Infinite Loop in 3.38
(1) By anonymous on 2022-05-03 08:12:18 [source]
after upgrading to 3.38, one of our queries seems to hang indefinitely, consuming 100% CPU. I did not have the patience to wait and see if it has just become VERY slow or if it is indeed stuck in some kind of infinite loop, but it should normally only take a few milliseconds.
I tested with 3.38.2 via NodeJS better-sqlite3 and 3.38.3 via NodeJS sqlite3, and can reproduce it on both. 3.37.2 works fine.
Getting a minimal reproduction was quite hard, this is the best I got:
select Assignment.id from Assignment inner join [Order] on [Order].id = Assignment.orderId and [Order].storno = 0 and [Order].deleted = 0 left join AssignmentWorkflowStatus on AssignmentWorkflowStatus.assignmentId = Assignment.id and not AssignmentWorkflowStatus.storno and not AssignmentWorkflowStatus.deleted and AssignmentWorkflowStatus.workunitId = '226' inner join Workunit on Assignment.workunitId = Workunit.id and ( Workunit.id = '226' or Workunit.role != 'TECHNICIAN' ) left join AssignmentPoolStatus on AssignmentPoolStatus.assignmentId = Assignment.id and AssignmentPoolStatus.workunitId = '226' where Assignment.deleted = 0 order by Assignment.plannedStart
The problem goes away if I
* remove the
order by clause
* remove any of the joins
* convert the
inner join to
* use a database with fewer entries
analyze (only fixed it on 3.38.2 / better-sqlite3)
Please let me know if you need more info.
Thanks for looking into it!
(2.1) By Richard Hipp (drh) on 2022-05-03 12:43:50 edited from 2.0 in reply to 1 [link] [source]
Thank you for the bug report.
This is in fact a bug in the byte-code generator. The new Bloom filter logic is generating a jump to instructions 0, which causes the whole query to start over again from the beginning, resulting in an infinite loop.
As a work-around, you can disable the Bloom-filter Pull-Down optimization using code like this:
sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0x100000);
If you are using the CLI, then the equivalent command is:
.testctrl optimizations 0x100000
If you do the above for the database connection that is running your query, it will disable the Bloom-filter Pull-Down optimization, resulting in a correct answer within milliseconds.
A proper fix is forthcoming, but it will take us a little bit of time to work through all the details.
The Bloom-filter pull-down optimization attempts to evaluate a Bloom filter early - long before it corresponding index lookup is performed - with the idea that if the Bloom filter shows that a match is impossible, lots of preliminary work can be skipped. In this particular instance, part of checking the Bloom filter early involves doing a NOT NULL test on a value and then running the equivalent of "break;" to jump out of a loop if the test fails. However, because the Bloom filter is being evaluated early, the jump destination label has not yet be set, causing the jump to go to instruction 0, which restarts the whole query, resulting in an infinite loop.
A recent trunk check-in adds new assert() statements that will fire if an error like this ever comes up again. If those assert()s had been in the code earlier, probably a fuzzer would of located this problem long ago. If you rerun the OP's test query using the latest trunk version of SQLite, you will get an assertion fault.
(3.1) By Richard Hipp (drh) on 2022-05-03 13:17:36 edited from 3.0 in reply to 1 [link] [source]
CREATE TABLE t1(a INT, b INT, c INT); WITH RECURSIVE c(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM c WHERE x<100) INSERT INTO t1(a,b,c) SELECT x, x*1000, x*1000000 FROM c; CREATE TABLE t2(b INT, x INT); INSERT INTO t2(b,x) SELECT b, a FROM t1 WHERE a%3==0; CREATE INDEX t2b ON t2(b); CREATE TABLE t3(c INT, y INT); INSERT INTO t3(c,y) SELECT c, a FROM t1 WHERE a%4==0; CREATE INDEX t3c ON t3(c); INSERT INTO t1(a,b,c) VALUES(200, 200000, NULL); ANALYZE; SELECT * FROM t1 NATURAL JOIN t2 NATURAL JOIN t3 WHERE x>0 AND y>0;
The final SELECT statement runs an infinite loop in SQLite version 3.38.0 through 3.38.3. An assertion fault occurs on trunk. The NULL value added to column t1.c by the antipenultimate statement (The INSERT just before ANALYZE) is the key ingredient of this bug that I missed during testing. Without that NULL, everything works fine.
NULLs and OUTER JOINs are the bane of all who strive to write SQL query optimizers...
(4) By anonymous on 2022-05-05 09:11:08 in reply to 2.1 [link] [source]
Thank you so much for the quick fix and the detailed explanation! Much appreciated :-)