|Title:||Over-aggressive LEFT JOIN optimization on virtual tables|
|Last Modified:||2018-04-10 12:20:16|
|Version Found In:|
drh added on 2018-04-03 12:48:12:
SQLite version 3.23.0 added the new LEFT JOIN strength reduction optimization that converts a LEFT JOIN into an ordinary JOIN if it can be proven that some column of the right-hand table of the LEFT JOIN must be non-NULL on every output row. But the prover is overzealous and believes that equality constraints a virtual table column require elements of the equality to be non-NULL. That may or may not be the case, depending on the virtual table. Here is an example using the generate_series() table-valued function where it is not:
CREATE TABLE t1(x); INSERT INTO t1 VALUES(1),(2); CREATE TABLE t2(y,z); INSERT INTO t2 VALUES(2,1); SELECT * FROM t1 LEFT JOIN t2 ON x=y JOIN generate_series WHERE start=z AND stop=2;
Because of the way the generate_series virtual table works, the WHERE clause term start=z can be true even if z is NULL. But the prover does not realize that and assumes that z must never be NULL, and hence the LEFT JOIN is allowed to change into a JOIN. An incorrect answer results.
One could argue that this is a bug in the generate_series virtual table implementation, not in the LEFT JOIN strength reduction prover. Nevertheless, we think this kind of problem is common with virtual tables, so it seems good to enhance the prover to assume that constraints against a virtual table might be true even if elements of the expression are NULL.
This problem was discovered by E.Pasma and reported on the SQLite users mailing list.
drh added on 2018-04-03 13:38:48:
Further investigation confirms that the problem is really in the generate_series virtual table. If the series.c file is compiled with the -DSQLITE_SERIES_CONSTRAINT_VERIFY=1 flag, then the generate_series virtual table does the right thing, and the correct answer is obtained.
Meanwhile, Peter Nichvolodov observes that the test case can be simplified using CTEs:
WITH t1(x) AS (VALUES(1),(2)), t2(y,z) AS (VALUES(2,1)) SELECT x, z, value, 'x' FROM t1 LEFT JOIN t2 ON x=y JOIN generate_series WHERE start=z AND stop=2;
drh added on 2018-04-03 14:37:43:
I am increasingly viewing this problem as a long-standing bug in the generate_series implementation that was merely exposed by the LEFT JOIN strength reduction optimization (LJSRO), and not a bug in the LJSRO. For that reason, the severity of this ticket is downgraded to "Important".