Incorrect results caused by join
(1) By Zuming Jiang (zuming_jiang) on 2023-03-02 11:39:10 [source]
Set up the database
CREATE TABLE t0 (c0 TEXT, c1 INTEGER);
INSERT INTO t0 VALUES(NULL,-2);
INSERT INTO t0 VALUES(NULL,1);
INSERT INTO t0 VALUES('0',2);
CREATE TABLE t2 (c12 INTEGER);
INSERT INTO t2 VALUES(2);
INSERT INTO t2 VALUES(-2);
Test Case 1
select *
from
(t0 as ref_0
left outer join (select
ref_1.c12 as c0,
(select count(c0) from t0) as c6
from
t2 as ref_1
) as subq_0
on (ref_0.c1 = subq_0.c0 ));
--- return 3 rows:
--- |-2|-2|1
--- |1||
--- 0|2|2|
It returns 3 rows "|-2|-2|1
", "|1||
", "0|2|2|
".
Then I add a WHERE clause where not (subq_0.c0 = 1)
. The test case becomes:
Test Case 2
select *
from
(t0 as ref_0
left outer join (select
ref_1.c12 as c0,
(select count(c0) from t0) as c6
from
t2 as ref_1
) as subq_0
on (ref_0.c1 = subq_0.c0 ))
where not (subq_0.c0 = 1);
--- return 2 rows:
--- 0|2|2|1
--- |-2|-2|1
The results returned by Test Case 2 should be the subset of the results of Test Case 1. However, the results of Test Case 2 contain "0|2|2|1
" which is not contained by the results of Test Case 1.
(2) By Richard Hipp (drh) on 2023-03-02 14:08:32 in reply to 1 [link] [source]
Fixed by check-in 8fe13f7a5e5eb798.
Analysis
This is yet another problem in the optimization that attempts to "flatten" the right operand of a LEFT JOIN (check-in 41c27bc0ff1d3135) from 2017-04-18). Simplified problem statement:
CREATE TABLE t1(x TEXT, y INTEGER); INSERT INTO t1(x,y) VALUES(NULL,-2),(NULL,1),('0',2); CREATE TABLE t2(z INTEGER); INSERT INTO t2(z) VALUES(2),(-2); CREATE VIEW t3 AS SELECT z, (SELECT count(*) FROM t1) AS w FROM t2; SELECT * FROM t1 LEFT JOIN t3 ON y=z;
The correct answer is:
┌──────┬────┬──────┬──────┐ │ x │ y │ z │ w │ ├──────┼────┼──────┼──────┤ │ NULL │ -2 │ -2 │ 3 │ │ NULL │ 1 │ NULL │ NULL │ │ '0' │ 2 │ 2 │ 3 │ └──────┴────┴──────┴──────┘
Prior to check-in 8fe13f7a5e5eb798, the "w" column on the row with "x='0'" was coming out as NULL.
The "w" column is computed by the subquery "(SELECT count(*) FROM t1)
".
Because that subquery is uncorrelated, it is computed once and its result
is stored in a register and reused as necessary. However, the value of the
subquery needs to be reported out as NULL for rows in which the LEFT JOIN does
not match. The generated byte-code was (incorrectly) overwriting the value
of the register that stored the result of the subquery with NULL, so that the
value was reported out as NULL for all subsequent rows, regardless of whether
or not the LEFT JOIN matched.
(3) By Zuming Jiang (zuming_jiang) on 2023-03-02 15:20:13 in reply to 2 [link] [source]
Thanks for fixing it!