SQLite Forum

inner join bug with compound queries
Login

inner join bug with compound queries

(1) By Rico Mariani (rmariani) on 2021-08-11 03:37:36 [link] [source]

This query gives the wrong answer on 3.35.x and I think 3.36. It's quite resistant to further reduction.

.version
WITH
T1 (x) AS (SELECT 10),
T2 (y) AS (SELECT 1),
T3 (z) AS (
  SELECT * FROM T2 WHERE 0
  UNION ALL
  SELECT * FROM T2
)
SELECT *
  FROM T1
  INNER JOIN T2
  INNER JOIN T3;
SELECT 'done'

It should return a single row.

SQLite 3.32.3 2020-06-18 14:16:19 02c344aceaea0d177dd42e62c8541e3cab4a26c757ba33b3a31a43ccc7d4aapl
clang-12.0.5
10|1|1
done

However 3.35.x returns no rows.

SQLite 3.35.5 2021-04-19 18:32:05 1b256d97b553a9611efca188a3d995a2fff712759044ba480f9a0c9e98fae886
clang-12.0.5
done

It might be further reducible but it is touchy. Even some of that goofy looking stuff seems to be necessary (it's less goofy in the original context).

One of the guys here tried 3.36 and it also had the same issue.

(2.1) By Rico Mariani (rmariani) on 2021-08-11 03:45:30 edited from 2.0 in reply to 1 [link] [source]

Further notes in case they are helpful

WITH
T1 (x) AS (SELECT 10),
T2 (y) AS (SELECT 1),
T3 (z) AS (
  SELECT * FROM T2 WHERE false
  UNION ALL
  SELECT * FROM T2
)
SELECT *
  FROM T1, T2, T3;

Also fails the same way. But if you use LEFT JOIN instead of INNER JOIN it works.

If you remove T1 from the equation or use T2 two times it works.

If you remove the useless UNION ALL it works (in the real query it's not useless)

By "it works" I mean "it returns one row as expected".

(3) By Rico Mariani (rmariani) on 2021-08-11 03:50:47 in reply to 2.1 [link] [source]

Here are the query plans:

SQLite 3.35.5 2021-04-19 18:32:05 1b256d97b553a9611efca188a3d995a2fff712759044ba480f9a0c9e98fae886
clang-12.0.5
QUERY PLAN
`--COMPOUND QUERY
   |--LEFT-MOST SUBQUERY
   |  |--MATERIALIZE 1
   |  |  `--SCAN CONSTANT ROW
   |  |--MATERIALIZE 2
   |  |  `--SCAN CONSTANT ROW
   |  |--SCAN SUBQUERY 1
   |  |--SCAN SUBQUERY 2
   |  `--SCAN SUBQUERY 2
   `--UNION ALL
      |--SCAN SUBQUERY 1
      |--SCAN SUBQUERY 2
      `--SCAN SUBQUERY 2
done
SQLite 3.32.3 2020-06-18 14:16:19 02c344aceaea0d177dd42e62c8541e3cab4a26c757ba33b3a31a43ccc7d4aapl
clang-12.0.5
QUERY PLAN
|--MATERIALIZE 1
|  `--SCAN CONSTANT ROW
|--MATERIALIZE 2
|  `--SCAN CONSTANT ROW
|--MATERIALIZE 4
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--CO-ROUTINE 2
|     |  |  `--SCAN CONSTANT ROW
|     |  `--SCAN SUBQUERY 2
|     `--UNION ALL
|        |--CO-ROUTINE 2
|        |  `--SCAN CONSTANT ROW
|        `--SCAN SUBQUERY 2
|--SCAN SUBQUERY 1
|--SCAN SUBQUERY 2
`--SCAN SUBQUERY 4
done

Looks like the compound query is in the wrong place.

(4) By Richard Hipp (drh) on 2021-08-11 13:09:26 in reply to 1 [source]

Thanks for the report and for simplifying the test case. Here is an even simpler test case:

WITH
  t1(x) AS (SELECT 111),
  t2(y) AS (SELECT 222),
  t3(z) AS (SELECT * FROM t2 WHERE false UNION ALL SELECT * FROM t2)
SELECT * FROM t1, t3;

I think I understand the malfunction. I should have a fix later today. If you are just curious about what is going wrong, you can keep reading.

Processing begins by expanding the various subqueries defined by the WITH clause. We have:

SELECT *
  FROM (SELECT 111) AS t1,
       (SELECT * FROM (SELECT 222) WHERE false UNION ALL SELECT * FROM (SELECT 222)) AS t2;

The query optimizer then brings the UNION ALL operator to the top-level as follows:

SELECT *
  FROM (SELECT 111) AS t1, (SELECT * FROM (SELECT 222)) AS t2 WHERE false
UNION ALL
SELECT *
  FROM (SELECT 111) AS t1, (SELECT * FROM (SELECT 222)) AS t2;

Next the push down optimization is applied to the left-hand side of the UNION ALL operator to give us:

SELECT *
  FROM (SELECT 111 WHERE false) AS t1, (SELECT * FROM (SELECT 222) WHERE false) AS t2 WHERE false
UNION ALL
SELECT *
  FROM (SELECT 111) AS t1, (SELECT * FROM (SELECT 222)) AS t2;

So far, everything is copacetic. The optimized query is still equivalent to the original. But here comes the problem: While SQLite is generated the byte-code for the left-hand side of the UNION ALL, it sees the "T1" subquery and it notices that T1 will be reused when evaluating the right-hand side of UNION ALL so it saves the result of the first T1 and tries to reuse it for the second T1. SQLite fails to notice that the first T1 was modified by the push-down optimization and is hence no longer equivalent to the second T1. Thus the query that ends up being evaluated is:

SELECT *
  FROM (SELECT 111 WHERE false) AS t1, (SELECT * FROM (SELECT 222) WHERE false) AS t2 WHERE false
UNION ALL
SELECT *
  FROM (SELECT 111 WHERE false) AS t1, (SELECT * FROM (SELECT 222)) AS t2;

The extra "WHERE false" (shown in red) that gets inserted into the second T1 is the cause of the incorrect result.

(5) By Rico Mariani (rmariani) on 2021-08-11 13:22:53 in reply to 4 [link] [source]

Awesome. This explains why its so touchy. And maybe why it doesn't happen if the joins are LEFT.

(6) By Domingo (mingodad) on 2021-08-11 13:54:00 in reply to 4 [link] [source]

Following the explanation a question came to my mind:

Have you done all the optimizations steps described by hand or sqlite shows that steps with a special parameter/flag ?