/ View Ticket
Ticket Hash: fac496b61722daf28dd85a2f1b6aca20a2fb7863
Title: Over-aggressive LEFT JOIN optimization on virtual tables
Status: Fixed Type: Code_Defect
Severity: Important Priority: Immediate
Subsystem: Code_Generator Resolution: Fixed
Last Modified: 2018-04-10 12:20:16
Version Found In:
User Comments:
drh added on 2018-04-03 12:48:12: (text/x-fossil-wiki)
SQLite version 3.23.0 added the new LEFT JOIN strength reduction optimization
that converts a LEFT JOIN into an ordinary JOIN if it can be proven that some
column of the right-hand table of the LEFT JOIN must be non-NULL on every output
row.  But the prover is overzealous and believes that equality constraints
a virtual table column require elements of the equality to be non-NULL.  That
may or may not be the case, depending on the virtual table.  Here is an
example using the generate_series() table-valued function where it is not:

SELECT * FROM t1 LEFT JOIN t2 ON x=y JOIN generate_series
 WHERE start=z AND stop=2;

Because of the way the generate_series virtual table works, 
the WHERE clause term start=z
can be true even if z is NULL.  But the prover does not realize that and assumes
that z must never be NULL, and hence the LEFT JOIN is allowed to change
into a JOIN.  An incorrect answer results.

One could argue that this is a bug in the generate_series virtual table
implementation, not in the LEFT JOIN strength reduction prover.  Nevertheless,
we think this kind of problem is common with virtual tables, so it seems
good to enhance the prover to assume that constraints against a virtual
table might be true even if elements of the expression are NULL.

This problem was discovered by E.Pasma and reported on the SQLite users
mailing list.

drh added on 2018-04-03 13:38:48: (text/x-fossil-wiki)
Further investigation confirms that the problem is really in the
generate_series virtual table.  If the series.c file is compiled with
the -DSQLITE_SERIES_CONSTRAINT_VERIFY=1 flag, then the generate_series
virtual table does the right thing, and the correct answer is obtained.

Meanwhile, Peter Nichvolodov observes that the test case can be
simplified using CTEs:

  t1(x) AS (VALUES(1),(2)),
  t2(y,z) AS (VALUES(2,1))
SELECT x, z, value, 'x' FROM t1 LEFT JOIN t2 ON x=y JOIN generate_series
 WHERE start=z AND stop=2;

drh added on 2018-04-03 14:37:43: (text/x-fossil-wiki)
I am increasingly viewing this problem as a long-standing bug in the
generate_series implementation that was merely exposed by the LEFT JOIN
strength reduction optimization (LJSRO), and not a bug in the LJSRO.
For that reason, the severity of this ticket is downgraded to "Important".