SQLite Forum

short circuiting subqueries with `IN`?
Login

short circuiting subqueries with `IN`?

(1) By aryairani on 2021-05-23 18:20:11 [link] [source]

Can I be expect that that my IN subquery will short-circuit on a positive match, or will the recursion complete first before testing begins?

(I couldn't spot the answer at https://www.sqlite.org/lang_expr.html or https://www.sqlite.org/optoverview.html)

Thanks!

SELECT ? IN (
  WITH RECURSIVE
    found(id) AS (
      SELECT self_hash_id
        FROM causal
        WHERE self_hash_id = ?
      UNION ALL
      SELECT parent_id
        FROM causal_parent
        INNER JOIN found ON found.id = causal_id
    )
  SELECT * FROM found
)

given

CREATE TABLE causal (
  self_hash_id INTEGER PRIMARY KEY NOT NULL REFERENCES hash(id),
  value_hash_id INTEGER NOT NULL REFERENCES hash(id)
);
CREATE INDEX causal_value_hash_id ON causal(value_hash_id);

CREATE TABLE causal_parent (
  causal_id INTEGER NOT NULL REFERENCES causal(self_hash_id),
  parent_id INTEGER NOT NULL REFERENCES causal(self_hash_id),
  PRIMARY KEY (causal_id, parent_id)
) WITHOUT ROWID;
CREATE INDEX causal_parent_causal_id ON causal_parent(causal_id);
CREATE INDEX causal_parent_parent_id ON causal_parent(parent_id);

CREATE TABLE hash (
  id INTEGER PRIMARY KEY NOT NULL,
  base32 TEXT NOT NULL
);

(2) By Larry Brasfield (larrybr) on 2021-05-23 18:59:06 in reply to 1 [link] [source]

As you can see here, the 3rd line of your query can be altered to either found(id) AS MATERIALIZED ( or found(id) AS NOT MATERIALIZED ( . And by prefacing the query with 'EXPLAIN', you can see the effect of those insertions relative to the non-hinted query.

(3) By Keith Medcalf (kmedcalf) on 2021-05-23 19:29:19 in reply to 1 [link] [source]

No, the recursive query will run to completion in order to populate the list.

(4.3) By Keith Medcalf (kmedcalf) on 2021-05-23 20:00:34 edited from 4.2 in reply to 1 [source]

You need to add a "terminiating condition" in order to terminate the recursion.

For example:

SELECT ?1 IN (
               WITH found(id) AS NOT MATERIALIZED
                    (
                        SELECT self_hash_id
                          FROM causal
                         WHERE self_hash_id == ?2
                     UNION ALL
                        SELECT parent_id
                          FROM causal_parent
                          JOIN found
                            ON found.id == causal_id
                    )
             SELECT id
               FROM found
              WHERE id == ?1
              LIMIT 1
            )
;

which will cause the list population to cease once the value is found. If the value is not found then the recursion will complete when the recursion is exhausted.

** NB ** NOT MATERIALIZED is the default, however, the short-circuit termination only applies if found(id) is not materialized.

(5) By Keith Medcalf (kmedcalf) on 2021-05-23 20:13:02 in reply to 4.3 [link] [source]

An alternate (and very slightly more efficient) spelling would be:

SELECT EXISTS (
                 WITH found(id) AS NOT MATERIALIZED
                      (
                          SELECT self_hash_id
                            FROM causal
                           WHERE self_hash_id == ?2
                       UNION ALL
                          SELECT parent_id
                            FROM causal_parent
                            JOIN found
                              ON found.id == causal_id
                      )
               SELECT id
                 FROM found
                WHERE id == ?1
              )
;

(7) By aryairani on 2021-05-24 03:46:11 in reply to 5 [link] [source]

Thanks @kmedcalf and @larrybr.

I'm currently using an embedded sqlite v3.28.0; where does that leave me with respect to trying to short-circuit the subquery? I see that AS MATERIALIZED / AS NOT MATERIALIZED were introduced only recently, in v3.35.0.

Or... upon a subsequent readings, it sounds like "NOT MATERIALIZED" is what I want, and it is the default prior to v3.35.0; so I'm good, provided I use add the termination condition you'd suggested?

(8) By Keith Medcalf (kmedcalf) on 2021-05-24 05:30:08 in reply to 7 [link] [source]

Yes. Prior to 3.35.0 the default was NOT MATERIALIZED and a table in a CTE was materialized only under certain conditions in which the CTE itself contained a condition which acted as a flattening boundary. The MATERIALIZED / NOT MATERIALIZED hint was added so that one could specifically control that boundary (or rather, specifically prohibit flattening by specifying MATERIALIZED, specifying NOT MATERIALIZED is the equivalent of specifying neither).

So for earlier versions it should work just fine as long as you add the terminating condition to the outer query since it is the query flattener which is triggering the short-circuit evaluation as part of flattening the subquery.

(6) By Keith Medcalf (kmedcalf) on 2021-05-23 20:23:43 in reply to 4.3 [link] [source]

Also note that if your data might have loops then you need to change the UNION ALL to UNION in order to prevent getting stuck in a loop.