Possible Bug in 3.38.1
(1) By Dieter Schmeer (dschmeer) on 2022-03-24 22:57:17 [source]
I think I found a bug in 3.38.1 that was not present in 3.37.2, maybe caused by the Bloom filter.
I simplified the problem to these two tables:
CREATE TABLE VersicherungenXX_tb ( Versicherungsnummer INTEGER PRIMARY KEY NOT NULL, Adressnummer INT ); CREATE TABLE AdressenXX_tb ( Adressnummer INTEGER PRIMARY KEY NOT NULL, Geburtsdatum DATE );
In my case, there is an Adressnummer in AdressenXX_tb for every Adressnummer in VersicherungenXX_tb, and Geburtsdatum is always '1900-01-01' and never NULL.
So the following statement should return an empty result set:
SELECT ve.Versicherungsnummer FROM VersicherungenXX_tb AS ve LEFT JOIN AdressenXX_tb AS ad ON ad.Adressnummer=ve.Adressnummer WHERE ad.Geburtsdatum IS NULL ORDER BY ve.Versicherungsnummer;
In 3.37.2, it actually returns an empty set, but in 3.38.1, it returns all rows of VersicherungenXX_tb.
Unfortunately, this is only the case if there are many rows in the tables, so I prepared a database that has the necessary entries:
Can anybody please have a look at this and confirm if this is a bug?
Thanks a lot
(2) By Richard Hipp (drh) on 2022-03-24 23:37:34 in reply to 1 [link] [source]
Confirmed. The following is a simplified test case:
CREATE TABLE t1(a INTEGER PRIMARY KEY, b INT); CREATE TABLE t2(c INTEGER PRIMARY KEY, d INT); WITH RECURSIVE c(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM c WHERE x<100) INSERT INTO t1(a,b) SELECT x, 1000*x FROM c; INSERT INTO t2(c,d) SELECT b*2, 1000000*a FROM t1; ANALYZE; DELETE FROM sqlite_stat1; INSERT INTO sqlite_stat1(tbl,idx,stat) VALUES ('t1',NULL,150105),('t2',NULL,98747); ANALYZE sqlite_schema; .eqp on SELECT count(*) FROM t1 LEFT JOIN t2 ON c=b WHERE c IS NULL; SELECT count(*) FROM t1 LEFT JOIN t2 ON +c=+b WHERE +c IS NULL; .testctrl optimizations 0x80000 SELECT count(*) FROM t1 LEFT JOIN t2 ON c=b WHERE c IS NULL;
Copy/paste the above into the CLI and see
that the first query gets the wrong answer whereas the second and third
get the correct answer. The "
.eqp on" line turns on automatic
EXPLAIN QUERY PLAN, which shows that the first query is using the
new Bloom filter optimization. The "
.testctrl" line turns off the
Bloom filter optimization and consequently gets the correct answer.
The second query gets the correct answer because the extra "+" operators
disable the use of indexes.
(3) By Richard Hipp (drh) on 2022-03-25 00:04:46 in reply to 2 [link] [source]
The problem appears to be that the "
c IS NULL" expression in the WHERE
clause is being used to restrict the entries in the Bloom filter. But that
is not correct. Because the Bloom filter is for the right-hand table of a
LEFT JOIN, only expressions in the ON clause can be used to restrict the
The expression "
c IS NULL" is always false. So the Bloom filter is empty.
The algorithm uses the Bloom filter to help evaluate "
c=b" and because the
Bloom filter is empty, it thinks that "
c=b" is always false, which is
incorrect. Because the algorithm thinks that "
c=b" is always false,
it always substitutes the value "
(NULL,NULL)" for the right-hand table.
That value for the right-hand table satisfies the WHERE clause, and so
every row in the left-hand table is output.
(4.1) By Richard Hipp (drh) on 2022-03-25 11:04:14 edited from 4.0 in reply to 1 [link] [source]
Thanks for the bug report. Please try the fix at check-in 1a6328f2a5b49730 and confirm that it fixes your problem.
(5) By Dieter Schmeer (dschmeer) on 2022-03-25 16:16:39 in reply to 4.1 [link] [source]
Thanks for the fix. I tested it with our original problem and I can confirm that it works.