SQLite Forum

Incorrect results caused by join
Login

Incorrect results caused by join

(1) By Zuming Jiang (zuming_jiang) on 2023-03-02 11:39:10 [source]

Set up the database

CREATE TABLE t0 (c0 TEXT, c1 INTEGER); INSERT INTO t0 VALUES(NULL,-2); INSERT INTO t0 VALUES(NULL,1); INSERT INTO t0 VALUES('0',2); CREATE TABLE t2 (c12 INTEGER); INSERT INTO t2 VALUES(2); INSERT INTO t2 VALUES(-2);

Test Case 1 select * from (t0 as ref_0 left outer join (select ref_1.c12 as c0, (select count(c0) from t0) as c6 from t2 as ref_1 ) as subq_0 on (ref_0.c1 = subq_0.c0 )); --- return 3 rows: --- |-2|-2|1 --- |1|| --- 0|2|2|

It returns 3 rows "|-2|-2|1", "|1||", "0|2|2|".

Then I add a WHERE clause where not (subq_0.c0 = 1). The test case becomes:

Test Case 2

select * from (t0 as ref_0 left outer join (select ref_1.c12 as c0, (select count(c0) from t0) as c6 from t2 as ref_1 ) as subq_0 on (ref_0.c1 = subq_0.c0 )) where not (subq_0.c0 = 1); --- return 2 rows: --- 0|2|2|1 --- |-2|-2|1

The results returned by Test Case 2 should be the subset of the results of Test Case 1. However, the results of Test Case 2 contain "0|2|2|1" which is not contained by the results of Test Case 1.

(2) By Richard Hipp (drh) on 2023-03-02 14:08:32 in reply to 1 [link] [source]

Fixed by check-in 8fe13f7a5e5eb798.

Analysis

This is yet another problem in the optimization that attempts to "flatten" the right operand of a LEFT JOIN (check-in 41c27bc0ff1d3135) from 2017-04-18). Simplified problem statement:

CREATE TABLE t1(x TEXT, y INTEGER);
INSERT INTO t1(x,y) VALUES(NULL,-2),(NULL,1),('0',2);
CREATE TABLE t2(z INTEGER);
INSERT INTO t2(z) VALUES(2),(-2);
CREATE VIEW t3 AS SELECT z, (SELECT count(*) FROM t1) AS w FROM t2;
SELECT * FROM t1 LEFT JOIN t3 ON y=z;

The correct answer is:

┌──────┬────┬──────┬──────┐
│  x   │ y  │  z   │  w   │
├──────┼────┼──────┼──────┤
│ NULL │ -2 │ -2   │ 3    │
│ NULL │ 1  │ NULL │ NULL │
│ '0'  │ 2  │ 2    │ 3    │
└──────┴────┴──────┴──────┘

Prior to check-in 8fe13f7a5e5eb798, the "w" column on the row with "x='0'" was coming out as NULL.

The "w" column is computed by the subquery "(SELECT count(*) FROM t1)". Because that subquery is uncorrelated, it is computed once and its result is stored in a register and reused as necessary. However, the value of the subquery needs to be reported out as NULL for rows in which the LEFT JOIN does not match. The generated byte-code was (incorrectly) overwriting the value of the register that stored the result of the subquery with NULL, so that the value was reported out as NULL for all subsequent rows, regardless of whether or not the LEFT JOIN matched.

(3) By Zuming Jiang (zuming_jiang) on 2023-03-02 15:20:13 in reply to 2 [link] [source]

Thanks for fixing it!