SQLite Forum

CROSS JOIN sometimes re-orders tables when lhs is a CTE
Login

CROSS JOIN sometimes re-orders tables when lhs is a CTE

(1) By Rob G (rob.golsteijn) on 2021-06-23 09:32:44 [link] [source]

Hi,

I wanted to manually optimize a query by using a CROSS JOIN. This should prevent the optimizer to change the order in which tables are visited. But with my query Sqlite still uses a table at the right hand side of the CROSS JOIN for the outer loop. I expected it to use the table at the left hand side of the CROSS JOIN (a CTE in my case) as the outer loop.

Tested with sqlite version 3.34.0.

The query below is simplified a lot wrt the actual version I use, but most likely not a "minimal" example. Semantically it is meaningless now, but it is only intended to show that Sqlite CROSS JOINs do not always prevents re-ordering as documented on https://www.sqlite.org/lang_select.html, Side note in section 2.1. A workaround to make Sqlite start with the records from the cte, is to write the results of the cte to an separate table, and use this table in the main query instead of the cte.

Example:

CREATE TABLE IF NOT EXISTS aaa(a1 INTEGER PRIMARY KEY, a2, a3);
CREATE TABLE IF NOT EXISTS ccc(c1 INTEGER PRIMARY KEY, c2, c3);

CREATE INDEX IF NOT EXISTS idx_aaa_a2 ON aaa(a2);
CREATE INDEX IF NOT EXISTS idx_ccc_c2 ON ccc(c2);

.eqp ON

WITH cte(c1, c2, c3, c4, c5) AS
            (SELECT ccc.c2 AS c1, ccc.c1 AS c2, ccc.c1 AS c3, ccc.c3 AS c4, aaa.a3 <> 0 AS c5
               FROM ccc INNER JOIN aaa ON aaa.a1 = ccc.c3 WHERE ccc.c2 = 12345
              UNION
             SELECT cte.c1 AS c1, ccc2.c1 AS c2, ccc2.c1 AS c3, ccc2.c3 AS c4, aaa.a3 <> 0 AS c5
               FROM cte
                    INNER JOIN ccc ccc1     ON ccc1.c3 = cte.c4 AND ccc1.c2 <> cte.c1
                    INNER JOIN ccc ccc2     ON ccc2.c3 = ccc1.c2 AND ccc2.c3 <> ccc1.c3
                    INNER JOIN aaa          ON aaa.a1 = ccc2.c3
              WHERE cte.c5 = 1)
SELECT *
  FROM             cte cte
        CROSS JOIN ccc outccc1  ON outccc1.c1  = cte.c1
        INNER JOIN ccc outccc2  ON outccc2.c2  = outccc1.c2
                               AND outccc2.c3 <> outccc1.c3
    WHERE cte.c5 = 0;

QUERY PLAN
|--CO-ROUTINE 2
|  |--SETUP
|  |  |--SEARCH TABLE ccc USING INDEX idx_ccc_c2 (c2=?)
|  |  `--SEARCH TABLE aaa USING INTEGER PRIMARY KEY (rowid=?)
|  `--RECURSIVE STEP
|     |--SCAN TABLE cte
|     |--SEARCH TABLE ccc AS ccc1 USING AUTOMATIC COVERING INDEX (c3=?)
|     |--SEARCH TABLE ccc AS ccc2 USING AUTOMATIC COVERING INDEX (c3=?)
|     `--SEARCH TABLE aaa USING INTEGER PRIMARY KEY (rowid=?)
|--SCAN TABLE ccc AS outccc2
|--SEARCH SUBQUERY 2 AS cte USING AUTOMATIC PARTIAL COVERING INDEX (c5=?)
`--SEARCH TABLE ccc AS outccc1 USING INDEX idx_ccc_c2 (c2=? AND rowid=?)

Although outccc2 is at the right hand side of a CROSS JOIN it is used as outer loop. The request is that the no-reorder requirement of the CROSS JOIN is fullfilled (or to document that it does not hold if the left had side operand is a cte).

(2) By Richard Hipp (drh) on 2021-06-23 11:09:01 in reply to 1 [link] [source]

SEARCH TABLE ccc AS outccc1 USING INDEX idx_ccc_c2 (c2=? AND rowid=?) (Emphasis added)

I'm less concerned about the CROSS JOIN issue that I am about why the query planner is using an index here, rather than just doing a direct ROWID lookup. Investigating now...

Simplified test case:

CREATE TABLE IF NOT EXISTS aaa(a1 INTEGER PRIMARY KEY, a2, a3);
CREATE TABLE IF NOT EXISTS ccc(c1 INTEGER PRIMARY KEY, c2, c3);
CREATE INDEX IF NOT EXISTS xc2 ON ccc(c2);
.eqp ON
WITH bbb(b1, b2) AS(
   VALUES(1,2)
   UNION ALL
   SELECT b1+1, b2+1 FROM bbb, aaa WHERE b2=1
)
SELECT *
  FROM bbb
 CROSS JOIN ccc AS c11 ON c11.c1  = bbb.b1
 INNER JOIN ccc AS c22 ON c22.c2  = c11.c2 AND c22.c3 <> c11.c3
 WHERE bbb.b1 = 0;

Output:

QUERY PLAN
|--CO-ROUTINE bbb
|  |--SETUP
|  |  `--SCAN CONSTANT ROW
|  `--RECURSIVE STEP
|     |--SCAN bbb
|     `--SCAN aaa
|--SCAN c22
|--SEARCH bbb USING AUTOMATIC PARTIAL COVERING INDEX (b1=?)
`--SEARCH c11 USING INDEX xc2 (c2=? AND rowid=?)

(3) By Keith Medcalf (kmedcalf) on 2021-06-23 11:09:11 in reply to 1 [link] [source]

outcc2 is on the RHS of an INNER JOIN, not a CROSS JOIN.

CROSS JOIN only preserves the descent order from the LHS table to the RHS table.

cte is still in the outer loop with respect to outcc1 as specified by the CROSS JOIN.

If you want lookup into outcc2 to occur in the innermost loop, you need to change the word INNER to CROSS in order to prevent re-ordering:

sqlite> WITH cte(c1, c2, c3, c4, c5) AS
   ...>             (SELECT ccc.c2 AS c1, ccc.c1 AS c2, ccc.c1 AS c3, ccc.c3 AS c4, aaa.a3 <> 0 AS c5
   ...>                FROM ccc INNER JOIN aaa ON aaa.a1 = ccc.c3 WHERE ccc.c2 = 12345
   ...>               UNION
   ...>              SELECT cte.c1 AS c1, ccc2.c1 AS c2, ccc2.c1 AS c3, ccc2.c3 AS c4, aaa.a3 <> 0 AS c5
   ...>                FROM cte
   ...>                     INNER JOIN ccc ccc1     ON ccc1.c3 = cte.c4 AND ccc1.c2 <> cte.c1
   ...>                     INNER JOIN ccc ccc2     ON ccc2.c3 = ccc1.c2 AND ccc2.c3 <> ccc1.c3
   ...>                     INNER JOIN aaa          ON aaa.a1 = ccc2.c3
   ...>               WHERE cte.c5 = 1)
   ...> SELECT *
   ...>   FROM             cte cte
   ...>         CROSS JOIN ccc outccc1  ON outccc1.c1  = cte.c1
   ...>         CROSS JOIN ccc outccc2  ON outccc2.c2  = outccc1.c2
   ...>                                AND outccc2.c3 <> outccc1.c3
   ...>     WHERE cte.c5 = 0;
QUERY PLAN
|--CO-ROUTINE cte
|  |--SETUP
|  |  |--SEARCH ccc USING INDEX idx_ccc_c2 (c2=?) (~10 rows)
|  |  `--SEARCH aaa USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
|  `--RECURSIVE STEP
|     |--SCAN cte (~524288 rows)
|     |--SEARCH ccc1 USING AUTOMATIC COVERING INDEX (c3=?) (~20 rows)
|     |--SEARCH ccc2 USING AUTOMATIC COVERING INDEX (c3=?) (~20 rows)
|     `--SEARCH aaa USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
|--SCAN cte (~100663296 rows)
|--SEARCH outccc1 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
`--SEARCH outccc2 USING INDEX idx_ccc_c2 (c2=?) (~9 rows)

(4.1) By Keith Medcalf (kmedcalf) on 2021-06-23 11:27:02 edited from 4.0 in reply to 2 [source]

Deleted

(5) By Keith Medcalf (kmedcalf) on 2021-06-23 11:27:15 in reply to 3 [link] [source]

That is, more specifically, in the expression:

A, B LEFT JOIN C JOIN D CROSS JOIN E

Only tables D and E are affected by the ordering constraint implied by the usage of the word CROSS. Re-ordering of all the tables is permitted so long as E is an inner table with respect to D.

You are (apparently) interpreting the phrase prevents the query optimizer from reordering the tables in the join as meaning tables A, B, C, D, and E. However, this is not the case. The tables being cross joined and to which the ordering restriction applies are D and E only.

You will also note that except for the specific case of a LEFT JOIN (where the ON clause specifies the conditions for the descent into the table on the RHS of the LEFT JOIN) an ON clause is merely syntactic sugar for a WHERE clause.

(6) By Richard Hipp (drh) on 2021-06-23 12:10:13 in reply to 2 [link] [source]

why the query planner is using an index here, rather than just doing a direct ROWID lookup

Answer: With a rowid lookup, the query planner expects to find one row of result. But with the indexed lookup on both column C2 and the rowid, it is expecting to find something closer to zero rows. And it turns out that doing an index lookup that finds zero rows is less work than a rowid lookup that finds one row, at least when downstream computations are taken into account.

As for the original CROSS JOIN question, Keith Medcalf has that right: The CROSS JOIN only determines the ordering of its immediate left and right terms. In the original query, the outccc2 table is allowed to reorder across the CROSS JOIN because it is not immediately adjacent to the CROSS JOIN. In other words, a CROSS JOIN is not a reordering barrier. The CROSS JOIN simply determines the relative order of its immediate left and right tables.

(7) By Rob G (robgolsteijn) on 2021-06-25 12:02:31 in reply to 5 [link] [source]

Thank you for the explanation. I indeed misinterpreted the documentation. I read what I hoped to read. Fixed my query now.