Incorrect results caused by OR operations in WHERE clause for complex JOIN tables
(1) By Zuming Jiang (zuming_jiang) on 2023-05-14 21:54:20 [source]
Set up the database
CREATE TABLE t1 (c2 TEXT, c3 REAL, c6 TEXT);
INSERT INTO t1 VALUES('',0.0,'*w0');
INSERT INTO t1 VALUES('!}',-4.93,'');
CREATE TABLE t3 (c20 REAL);
INSERT INTO t3 VALUES(NULL);
Test Case 1
select *
from
(((t1 as ref_0
inner join t1 as ref_2
on null)
full outer join t3 as ref_4
on true)
left outer join ((t1 as ref_6
inner join t1 as ref_7
on (ref_6.c6 = ref_7.c2))
) on true)
where (ref_7.c3 >= ref_7.c3) or (ref_4.c20 <> ref_6.c3);
--- return 1 row: {|||||||!}|-4.93|||0.0|*w0}
Then, I split Test Case 1 into Test Case 2 and Test Case 3 by splitting the predicate (ref_7.c3 >= ref_7.c3) or (ref_4.c20 <> ref_6.c3)
into (ref_7.c3 >= ref_7.c3)
, (ref_4.c20 <> ref_6.c3)
.
Test Case 2
select *
from
(((t1 as ref_0
inner join t1 as ref_2
on null)
full outer join t3 as ref_4
on true)
left outer join ((t1 as ref_6
inner join t1 as ref_7
on (ref_6.c6 = ref_7.c2))
) on true)
where (ref_7.c3 >= ref_7.c3);
--- return 0 row: {}
Test Case 3
select *
from
(((t1 as ref_0
inner join t1 as ref_2
on null)
full outer join t3 as ref_4
on true)
left outer join ((t1 as ref_6
inner join t1 as ref_7
on (ref_6.c6 = ref_7.c2))
) on true)
where (ref_4.c20 <> ref_6.c3);
--- return 0 row: {}
The results of Test Case 1 should be equal to the union of the results of Test Case 2 and Test Case 3. However, both Test Case 2 and Test Case 3 return 0 rows, whose union is not equal to the 1 row returned by Test Case 1.
SQLite version: 3.42.0 2023-05-11 21:15:55 3e9c9bbdb59b9d500ff218db538c047c83da7ac18ebb95c3ee7629ab15e0b43a
(2) By Richard Hipp (drh) on 2023-05-15 02:20:23 in reply to 1 [link] [source]
Should now be fixed at check-in 4902015dcf3869f0.
Simplified Test Case
CREATE TABLE t1(a INT); INSERT INTO t1 VALUES(1); CREATE TABLE t2(b INT); INSERT INTO t2 VALUES(2); CREATE TABLE t3(c INT); INSERT INTO t3 VALUES(3); CREATE TABLE t4(d INT); INSERT INTO t4 VALUES(4); CREATE TABLE t5(e INT); INSERT INTO t5 VALUES(5); SELECT * FROM t1 JOIN t2 ON null RIGHT JOIN t3 ON true LEFT JOIN (t4 JOIN t5 ON d+1=e) ON d=4 WHERE e>0;
The query should return a single row: NULL NULL 3 4 5
Analysis
The query optimizer attempts to build an automatic indexes on t4 and t5 to help with the join of those two tables. (This would be called doing a "hash join" in other RDBMSes.) The optimizer also tried to reduce the size of the automatic index by using the "ON null" constraint on the t1/t2 join. But because that constraint was on the opposite side of a RIGHT JOIN, it is logically incorrect to do so, and an incorrect answer resulted.
This problem has been in the code since the RIGHT JOIN capability was added in version 3.39.0.
(3) By Zuming Jiang (zuming_jiang) on 2023-05-15 09:49:33 in reply to 2 [link] [source]
Thank you for fixing this bug!