short circuiting subqueries with `IN`?
(1) By aryairani on 2021-05-23 18:20:11 [source]
Can I be expect that that my
IN subquery will short-circuit on a positive match, or will the recursion complete first before testing begins?
(I couldn't spot the answer at https://www.sqlite.org/lang_expr.html or https://www.sqlite.org/optoverview.html)
SELECT ? IN ( WITH RECURSIVE found(id) AS ( SELECT self_hash_id FROM causal WHERE self_hash_id = ? UNION ALL SELECT parent_id FROM causal_parent INNER JOIN found ON found.id = causal_id ) SELECT * FROM found )
CREATE TABLE causal ( self_hash_id INTEGER PRIMARY KEY NOT NULL REFERENCES hash(id), value_hash_id INTEGER NOT NULL REFERENCES hash(id) ); CREATE INDEX causal_value_hash_id ON causal(value_hash_id); CREATE TABLE causal_parent ( causal_id INTEGER NOT NULL REFERENCES causal(self_hash_id), parent_id INTEGER NOT NULL REFERENCES causal(self_hash_id), PRIMARY KEY (causal_id, parent_id) ) WITHOUT ROWID; CREATE INDEX causal_parent_causal_id ON causal_parent(causal_id); CREATE INDEX causal_parent_parent_id ON causal_parent(parent_id); CREATE TABLE hash ( id INTEGER PRIMARY KEY NOT NULL, base32 TEXT NOT NULL );
(2) By Larry Brasfield (larrybr) on 2021-05-23 18:59:06 in reply to 1 [link] [source]
As you can see here, the 3rd line of your query can be altered to either
found(id) AS MATERIALIZED (
found(id) AS NOT MATERIALIZED (
. And by prefacing the query with 'EXPLAIN', you can see the effect of those insertions relative to the non-hinted query.
(3) By Keith Medcalf (kmedcalf) on 2021-05-23 19:29:19 in reply to 1 [link] [source]
No, the recursive query will run to completion in order to populate the list.
(4.3) By Keith Medcalf (kmedcalf) on 2021-05-23 20:00:34 edited from 4.2 in reply to 1 [link] [source]
You need to add a "terminiating condition" in order to terminate the recursion.
SELECT ?1 IN ( WITH found(id) AS NOT MATERIALIZED ( SELECT self_hash_id FROM causal WHERE self_hash_id == ?2 UNION ALL SELECT parent_id FROM causal_parent JOIN found ON found.id == causal_id ) SELECT id FROM found WHERE id == ?1 LIMIT 1 ) ;
which will cause the list population to cease once the value is found. If the value is not found then the recursion will complete when the recursion is exhausted.
** NB ** NOT MATERIALIZED is the default, however, the short-circuit termination only applies if
found(id) is not materialized.
(5) By Keith Medcalf (kmedcalf) on 2021-05-23 20:13:02 in reply to 4.3 [link] [source]
An alternate (and very slightly more efficient) spelling would be:
SELECT EXISTS ( WITH found(id) AS NOT MATERIALIZED ( SELECT self_hash_id FROM causal WHERE self_hash_id == ?2 UNION ALL SELECT parent_id FROM causal_parent JOIN found ON found.id == causal_id ) SELECT id FROM found WHERE id == ?1 ) ;
(6) By Keith Medcalf (kmedcalf) on 2021-05-23 20:23:43 in reply to 4.3 [link] [source]
Also note that if your data might have loops then you need to change the
UNION ALL to
UNION in order to prevent getting stuck in a loop.
(7) By aryairani on 2021-05-24 03:46:11 in reply to 5 [link] [source]
Thanks @kmedcalf and @larrybr.
I'm currently using an embedded sqlite v3.28.0; where does that leave me with respect to trying to short-circuit the subquery? I see that
AS MATERIALIZED /
AS NOT MATERIALIZED were introduced only recently, in v3.35.0.
Or... upon a subsequent readings, it sounds like "NOT MATERIALIZED" is what I want, and it is the default prior to v3.35.0; so I'm good, provided I use add the termination condition you'd suggested?
(8) By Keith Medcalf (kmedcalf) on 2021-05-24 05:30:08 in reply to 7 [link] [source]
Yes. Prior to 3.35.0 the default was NOT MATERIALIZED and a table in a CTE was materialized only under certain conditions in which the CTE itself contained a condition which acted as a flattening boundary. The MATERIALIZED / NOT MATERIALIZED hint was added so that one could specifically control that boundary (or rather, specifically prohibit flattening by specifying MATERIALIZED, specifying NOT MATERIALIZED is the equivalent of specifying neither).
So for earlier versions it should work just fine as long as you add the terminating condition to the outer query since it is the query flattener which is triggering the short-circuit evaluation as part of flattening the subquery.