Query performance regression after v3.33.0 (OP_SeekScan?)
(1) By mlin (dnamlin) on 2021-06-01 06:30:08 [source]
My application hit a severe performance regression upgrading SQLite to v3.35.4 from v3.33.0, I think possibly due to the new OP_SeekScan feature. I've distilled a reproduction in this gist:
https://gist.github.com/mlin/a5e9ef7ca0d6583b8397a0fb324e4327
Briefly, our database stores a graph, with nodes, edges connecting them, and indexes thereof. Then we get a 1-column temporary table providing a certain subset of node IDs, and we wish to query for those edges touching only nodes included in the subset:
SELECT count(_rowid_)
FROM edge
WHERE node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes
With the numerous constants in the Python script, SQLite v3.35.5 runs this query roughly 500X slower than v3.31.1. (My Rust application, on which the repro script is based, hit the problem after upgrading from v3.33.0 to v3.35.4)
The high-level "explain query plans" are identical between the two versions, but, the opcode-level "explain" results differ (shown in the gist), notably including the new SeekScan operation. I speculate perhaps the two "IN" clauses in our query lead to some kind of quadratic cost to the new scan attempt?
Thanks in advance for advice/attention.
(2) By mlin (dnamlin) on 2021-06-01 11:54:02 in reply to 1 [link] [source]
Brief PS: simply removing indexes from this example (deleting lines 40 & 41 from my repro script) makes the query hundreds of times faster using v3.35.5, and the plan doesn't use SeekScan in that case. It smells to me like the SeekScan scanning effort parameter (This.P1) might somehow be off by orders of magnitude in the circumstances I've created.
(3) By anonymous on 2021-06-01 13:54:47 in reply to 1 [link] [source]
Any reason not to use a three-way join?
select count(*) from edge, temp.sub_nodes src on src.node_id=node_from, temp.sub_nodes dst on dst.node_id=node_to;
(4) By Richard Hipp (drh) on 2021-06-01 16:48:13 in reply to 1 [link] [source]
As a temporary work-around, please try modifying your query by putting a single "+" in from of the node_from in the WHERE clause:
SELECT count(_rowid_) FROM edge WHERE +node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes; -- ^-- extra "+" here
(5) By mlin (dnamlin) on 2021-06-01 20:51:46 in reply to 3 [link] [source]
Any reason not to use a three-way join?
Thanks for this suggestion! It does solve the problem in SQLite v3.35, but makes it ~equally as fast as the original version in SQLite v3.31, which suggests the query planner is, impressively, supposed to run my original (more concise and readable?) version in effectively the same way.
Original plus Richard's workaround:
EXPLAIN QUERY PLAN SELECT count(_rowid_) FROM edge WHERE +node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes
SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
Three-way join:
EXPLAIN QUERY PLAN SELECT count(edge._rowid_) FROM edge, temp.sub_nodes sub_from ON sub_from.node_id = node_from, temp.sub_nodes sub_to ON sub_to.node_id = node_to
SCAN TABLE sub_nodes AS sub_from
SEARCH TABLE edge USING COVERING INDEX edge_from_to (node_from=?)
SEARCH TABLE sub_nodes AS sub_to USING INTEGER PRIMARY KEY (rowid=?)
The runtimes of these two queries remain roughly equivalent with increasing problem size.
(6) By mlin (dnamlin) on 2021-06-01 20:55:12 in reply to 4 [link] [source]
Yes, this worked like a charm, both in my repro script and in the full application. With the hint, the bytecode no longer uses SeekScan -- one more piece of evidence pointing in that direction. Thank you!!
(7) By Keith Medcalf (kmedcalf) on 2021-06-01 22:34:26 in reply to 1 [link] [source]
Should not that query be exactly equal to:
SELECT count(_rowid_)
FROM edge
WHERE EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_from)
AND EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_to)
;
since value IN table
means where exists a row in table where the primary key of table == value?
(8) By mlin (dnamlin) on 2021-06-01 23:30:31 in reply to 7 [link] [source]
I can agree that these queries should produce the same exact result sets, but the query planner certainly seems to distinguish them. All with v3.35.5 and problem size N=1000000:
- Mike's version plus Richard's hint (WHERE...IN)
EXPLAIN QUERY PLAN SELECT count(_rowid_) FROM edge
WHERE +node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes
SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
- anonymous' three-way join
EXPLAIN QUERY PLAN SELECT count(edge._rowid_) FROM edge,
temp.sub_nodes sub_from ON sub_from.node_id = node_from,
temp.sub_nodes sub_to ON sub_to.node_id = node_to
SCAN TABLE sub_nodes AS sub_from
SEARCH TABLE edge USING COVERING INDEX edge_from_to (node_from=?)
SEARCH TABLE sub_nodes AS sub_to USING INTEGER PRIMARY KEY (rowid=?)
- Keith's version plus Richard's hint (WHERE...EXISTS)
EXPLAIN QUERY PLAN SELECT count(_rowid_)
FROM edge
WHERE EXISTS (SELECT * FROM sub_nodes WHERE node_id == +node_from)
AND EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_to)
SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
CORRELATED SCALAR SUBQUERY 1
SEARCH TABLE sub_nodes USING INTEGER PRIMARY KEY (rowid=?)
- Mike's or Keith's version without Richard's hint (disastrously slow, using SeekScan)
SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
Beauty is in the eye of the beholder, but my version is the most concise and most closely approximates what I'd say in English ("gimme the edges whose node_from and node_to are both in sub_nodes). And SQLite v3.33.0 did great with it even without the hint.
I'm certainly learning a ton from all this though, thanks!
(9) By Keith Medcalf (kmedcalf) on 2021-06-02 00:49:47 in reply to 8 [link] [source]
Using the current tip of trunk (my version of a8d921136f) I get the following results (this is after analyze being run).
Clearly the different spellings of the same request result in wildly different query plans and vdbe code being generated, the most efficient being the "direct join" where the planner has the most latitude to choose the nesting order.
The interesting thing is that the IN form should be no less efficient that the EXISTS form, but it is.
sqlite> SELECT count(_rowid_)
...> FROM edge
...> WHERE node_from IN sub_nodes
...> AND node_to IN sub_nodes
...> ;
QUERY PLAN
|--SEARCH edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?) (~44 rows)
|--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
`--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 31 0 0 Start at 31
1 Null 0 1 2 0 r[1..2]=NULL
2 OpenRead 3 8 0 k(3,,,) 0 root=8 iDb=0; edge_to_from
3 ColumnsUsed 3 0 0 3 0
4 Explain 4 0 0 SEARCH edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?) (~44 rows) 0
5 Noop 0 0 0 0 Begin WHERE-loop0: edge
6 Once 0 8 0 0
7 OpenRead 4 4 0 1 0 root=4 iDb=0; sub_nodes
8 Rewind 4 26 0 0
9 Rowid 4 3 0 0 r[3]=rowid
10 IsNull 3 25 0 0 if r[3]==NULL goto 25
11 Once 0 13 0 0
12 OpenRead 5 4 0 1 0 root=4 iDb=0; sub_nodes
13 Rewind 5 25 0 0
14 Rowid 5 4 0 0 r[4]=rowid
15 IsNull 4 24 0 0 if r[4]==NULL goto 24
16 SeekScan 18 19 0 0 Scan-ahead up to 18 rows
17 SeekGE 3 24 3 2 0 key=r[3..4]
18 IdxGT 3 24 3 2 0 key=r[3..4]
19 Noop 0 0 0 0 Begin WHERE-core
20 IdxRowid 3 5 0 0 r[5]=rowid
21 AggStep 0 5 1 count(1) 1 accum=r[1] step(r[5])
22 Noop 0 0 0 0 End WHERE-core
23 Next 3 18 0 0
24 Next 5 14 0 0
25 Next 4 9 0 0
26 Noop 0 0 0 0 End WHERE-loop0: edge
27 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1
28 Copy 1 6 0 0 r[6]=r[1]
29 ResultRow 6 1 0 0 output=r[6]
30 Halt 0 0 0 0
31 Transaction 0 0 7 0 1 usesStmtJournal=0
32 Goto 0 1 0 0
┌────────────────┐
│ count(_rowid_) │
├────────────────┤
│ 1929 │
└────────────────┘
Run Time: real 5.817 user 5.703125 sys 0.031250
sqlite> SELECT count(_rowid_)
...> FROM edge
...> WHERE +node_from IN sub_nodes
...> AND node_to IN sub_nodes
...> ;
QUERY PLAN
|--SEARCH edge USING COVERING INDEX edge_to_from (node_to=?) (~44 rows)
|--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
`--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 35 0 0 Start at 35
1 Null 0 1 2 0 r[1..2]=NULL
2 OpenRead 3 8 0 k(3,,,) 2 root=8 iDb=0; edge_to_from
3 ColumnsUsed 3 0 0 3 0
4 Explain 4 0 0 SEARCH edge USING COVERING INDEX edge_to_from (node_to=?) (~44 rows) 0
5 Noop 0 0 0 0 Begin WHERE-loop0: edge
6 Once 0 8 0 0
7 OpenRead 4 4 0 1 0 root=4 iDb=0; sub_nodes
8 Rewind 4 30 0 0
9 Rowid 4 3 0 0 r[3]=rowid
10 IsNull 3 29 0 0 if r[3]==NULL goto 29
11 SeekGE 3 29 3 1 0 key=r[3]
12 IdxGT 3 29 3 1 0 key=r[3]
13 Noop 0 0 0 0 begin IN expr
14 Once 0 16 0 0
15 OpenRead 5 4 0 1 0 root=4 iDb=0; sub_nodes
16 Column 3 1 4 0 r[4]=edge.node_from
17 SeekRowid 5 28 4 0 intkey=r[4]
18 Goto 0 24 0 0
19 Goto 0 28 0 0
20 Rewind 5 28 0 0
21 Column 5 0 5 0 r[5]=
22 Ne 4 28 5 BINARY-8 0 if r[5]!=r[4] goto 28
23 Goto 0 28 0 0 end IN expr
24 Noop 0 0 0 0 Begin WHERE-core
25 IdxRowid 3 5 0 0 r[5]=rowid
26 AggStep 0 5 1 count(1) 1 accum=r[1] step(r[5])
27 Noop 0 0 0 0 End WHERE-core
28 Next 3 12 0 0
29 Next 4 9 0 0
30 Noop 0 0 0 0 End WHERE-loop0: edge
31 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1
32 Copy 1 6 0 0 r[6]=r[1]
33 ResultRow 6 1 0 0 output=r[6]
34 Halt 0 0 0 0
35 Transaction 0 0 7 0 1 usesStmtJournal=0
36 Goto 0 1 0 0
┌────────────────┐
│ count(_rowid_) │
├────────────────┤
│ 1929 │
└────────────────┘
Run Time: real 0.124 user 0.015625 sys 0.015625
sqlite> SELECT count(_rowid_)
...> FROM edge
...> WHERE +node_from IN sub_nodes
...> AND +node_to IN sub_nodes
...> ;
QUERY PLAN
|--SCAN edge (~180224 rows)
|--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
`--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 39 0 0 Start at 39
1 Null 0 1 2 0 r[1..2]=NULL
2 OpenRead 0 3 0 2 0 root=3 iDb=0; edge
3 ColumnsUsed 0 0 0 3 0
4 Explain 4 0 0 SCAN edge (~180224 rows) 0
5 Noop 0 0 0 0 Begin WHERE-loop0: edge
6 Rewind 0 34 0 0
7 Noop 0 0 0 0 begin IN expr
8 Once 0 10 0 0
9 OpenRead 3 4 0 1 0 root=4 iDb=0; sub_nodes
10 Column 0 0 3 0 r[3]=edge.node_from
11 SeekRowid 3 33 3 0 intkey=r[3]
12 Goto 0 18 0 0
13 Goto 0 33 0 0
14 Rewind 3 33 0 0
15 Column 3 0 4 0 r[4]=
16 Ne 3 33 4 BINARY-8 0 if r[4]!=r[3] goto 33
17 Goto 0 33 0 0 end IN expr
18 Noop 0 0 0 0 begin IN expr
19 Once 0 21 0 0
20 OpenRead 4 4 0 1 0 root=4 iDb=0; sub_nodes
21 Column 0 1 4 0 r[4]=edge.node_to
22 SeekRowid 4 33 4 0 intkey=r[4]
23 Goto 0 29 0 0
24 Goto 0 33 0 0
25 Rewind 4 33 0 0
26 Column 4 0 5 0 r[5]=
27 Ne 4 33 5 BINARY-8 0 if r[5]!=r[4] goto 33
28 Goto 0 33 0 0 end IN expr
29 Noop 0 0 0 0 Begin WHERE-core
30 Rowid 0 5 0 0 r[5]=rowid
31 AggStep 0 5 1 count(1) 1 accum=r[1] step(r[5])
32 Noop 0 0 0 0 End WHERE-core
33 Next 0 7 0 1
34 Noop 0 0 0 0 End WHERE-loop0: edge
35 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1
36 Copy 1 6 0 0 r[6]=r[1]
37 ResultRow 6 1 0 0 output=r[6]
38 Halt 0 0 0 0
39 Transaction 0 0 7 0 1 usesStmtJournal=0
40 Goto 0 1 0 0
┌────────────────┐
│ count(_rowid_) │
├────────────────┤
│ 1929 │
└────────────────┘
Run Time: real 0.151 user 0.031250 sys 0.046875
sqlite> SELECT count(_rowid_)
...> FROM edge
...> WHERE EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_from)
...> AND EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_to)
...> ;
QUERY PLAN
|--SCAN edge (~180224 rows)
|--CORRELATED SCALAR SUBQUERY 1
| `--SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
`--CORRELATED SCALAR SUBQUERY 2
`--SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 49 0 0 Start at 49
1 Null 0 1 2 0 r[1..2]=NULL
2 OpenRead 0 3 0 2 0 root=3 iDb=0; edge
3 ColumnsUsed 0 0 0 3 0
4 Explain 4 0 0 SCAN edge (~180224 rows) 0
5 Noop 0 0 0 0 Begin WHERE-loop0: edge
6 Rewind 0 44 0 0
7 Integer 21 4 0 0 r[4]=21; return address
8 Integer 0 5 0 0 r[5]=0; Init EXISTS result
9 Integer 1 6 0 0 r[6]=1; LIMIT counter
10 OpenRead 1 4 0 0 0 root=4 iDb=0; sub_nodes
11 ColumnsUsed 1 0 0 0 0
12 Explain 12 0 0 SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0
13 Noop 0 0 0 0 Begin WHERE-loop0: sub_nodes
14 Column 0 0 7 0 r[7]=edge.node_from
15 SeekRowid 1 20 7 0 intkey=r[7]
16 Noop 0 0 0 0 Begin WHERE-core
17 Integer 1 5 0 0 r[5]=1
18 DecrJumpZero 6 21 0 0 if (--r[6])==0 goto 21
19 Noop 0 0 0 0 End WHERE-core
20 Noop 0 0 0 0 End WHERE-loop0: sub_nodes
21 Return 4 0 0 0
22 IfNot 5 43 1 0
23 Integer 37 9 0 0 r[9]=37; return address
24 Integer 0 10 0 0 r[10]=0; Init EXISTS result
25 Integer 1 11 0 0 r[11]=1; LIMIT counter
26 OpenRead 2 4 0 0 0 root=4 iDb=0; sub_nodes
27 ColumnsUsed 2 0 0 0 0
28 Explain 28 0 0 SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0
29 Noop 0 0 0 0 Begin WHERE-loop0: sub_nodes
30 Column 0 1 12 0 r[12]=edge.node_to
31 SeekRowid 2 36 12 0 intkey=r[12]
32 Noop 0 0 0 0 Begin WHERE-core
33 Integer 1 10 0 0 r[10]=1
34 DecrJumpZero 11 37 0 0 if (--r[11])==0 goto 37
35 Noop 0 0 0 0 End WHERE-core
36 Noop 0 0 0 0 End WHERE-loop0: sub_nodes
37 Return 9 0 0 0
38 IfNot 10 43 1 0
39 Noop 0 0 0 0 Begin WHERE-core
40 Rowid 0 3 0 0 r[3]=rowid
41 AggStep 0 3 1 count(1) 1 accum=r[1] step(r[3])
42 Noop 0 0 0 0 End WHERE-core
43 Next 0 7 0 1
44 Noop 0 0 0 0 End WHERE-loop0: edge
45 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1
46 Copy 1 14 0 0 r[14]=r[1]
47 ResultRow 14 1 0 0 output=r[14]
48 Halt 0 0 0 0
49 Transaction 0 0 7 0 1 usesStmtJournal=0
50 Goto 0 1 0 0
┌────────────────┐
│ count(_rowid_) │
├────────────────┤
│ 1929 │
└────────────────┘
Run Time: real 0.200 user 0.078125 sys 0.046875
sqlite> SELECT count(edge._rowid_)
...> FROM edge
...> JOIN sub_nodes as s1 ON s1.node_id == node_from
...> JOIN sub_nodes as s2 ON s2.node_id == node_to
...> ;
QUERY PLAN
|--SCAN s1 (~9216 rows)
|--SEARCH edge USING COVERING INDEX edge_from_to (node_from=?) (~2 rows)
`--SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 35 0 0 Start at 35
1 Null 0 1 2 0 r[1..2]=NULL
2 OpenRead 1 4 0 0 0 root=4 iDb=0; sub_nodes
3 ColumnsUsed 1 0 0 0 0
4 OpenRead 3 7 0 k(3,,,) 2 root=7 iDb=0; edge_from_to
5 ColumnsUsed 3 0 0 3 0
6 OpenRead 2 4 0 0 0 root=4 iDb=0; sub_nodes
7 ColumnsUsed 2 0 0 0 0
8 Explain 8 0 0 SCAN s1 (~9216 rows) 0
9 Noop 0 0 0 0 Begin WHERE-loop0: sub_nodes
10 Rewind 1 30 0 0
11 Explain 11 0 0 SEARCH edge USING COVERING INDEX edge_from_to (node_from=?) (~2 rows) 0
12 Noop 0 0 0 0 Begin WHERE-loop1: edge
13 Rowid 1 3 0 0 r[3]=rowid
14 CursorHint 3 0 0 EQ(r[3],c0) 0
15 Rowid 1 4 0 0 r[4]=rowid
16 SeekGE 3 28 4 1 0 key=r[4]
17 IdxGT 3 28 4 1 0 key=r[4]
18 Explain 18 0 0 SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0
19 Noop 0 0 0 0 Begin WHERE-loop2: sub_nodes
20 Column 3 1 5 0 r[5]=edge.node_to
21 SeekRowid 2 26 5 0 intkey=r[5]
22 Noop 0 0 0 0 Begin WHERE-core
23 IdxRowid 3 6 0 0 r[6]=rowid
24 AggStep 0 6 1 count(1) 1 accum=r[1] step(r[6])
25 Noop 0 0 0 0 End WHERE-core
26 Noop 0 0 0 0 End WHERE-loop2: sub_nodes
27 Next 3 17 0 0
28 Noop 0 0 0 0 End WHERE-loop1: edge
29 Next 1 11 0 1
30 Noop 0 0 0 0 End WHERE-loop0: sub_nodes
31 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1
32 Copy 1 7 0 0 r[7]=r[1]
33 ResultRow 7 1 0 0 output=r[7]
34 Halt 0 0 0 0
35 Transaction 0 0 7 0 1 usesStmtJournal=0
36 Goto 0 1 0 0
┌─────────────────────┐
│ count(edge._rowid_) │
├─────────────────────┤
│ 1929 │
└─────────────────────┘
Run Time: real 0.114 user 0.015625 sys 0.000000
sqlite> SELECT count(edge._rowid_)
...> FROM edge
...> JOIN sub_nodes as s1 ON s1.node_id == +node_from
...> JOIN sub_nodes as s2 ON s2.node_id == +node_to
...> ;
QUERY PLAN
|--SCAN edge (~196608 rows)
|--SEARCH s1 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
`--SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 31 0 0 Start at 31
1 Null 0 1 2 0 r[1..2]=NULL
2 OpenRead 0 3 0 2 0 root=3 iDb=0; edge
3 ColumnsUsed 0 0 0 3 0
4 OpenRead 1 4 0 0 0 root=4 iDb=0; sub_nodes
5 ColumnsUsed 1 0 0 0 0
6 OpenRead 2 4 0 0 0 root=4 iDb=0; sub_nodes
7 ColumnsUsed 2 0 0 0 0
8 Explain 8 0 0 SCAN edge (~196608 rows) 0
9 Noop 0 0 0 0 Begin WHERE-loop0: edge
10 Rewind 0 26 0 0
11 Explain 11 0 0 SEARCH s1 USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0
12 Noop 0 0 0 0 Begin WHERE-loop1: sub_nodes
13 Column 0 0 3 0 r[3]=edge.node_from
14 SeekRowid 1 24 3 0 intkey=r[3]
15 Explain 15 0 0 SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0
16 Noop 0 0 0 0 Begin WHERE-loop2: sub_nodes
17 Column 0 1 4 0 r[4]=edge.node_to
18 SeekRowid 2 23 4 0 intkey=r[4]
19 Noop 0 0 0 0 Begin WHERE-core
20 Rowid 0 5 0 0 r[5]=rowid
21 AggStep 0 5 1 count(1) 1 accum=r[1] step(r[5])
22 Noop 0 0 0 0 End WHERE-core
23 Noop 0 0 0 0 End WHERE-loop2: sub_nodes
24 Noop 0 0 0 0 End WHERE-loop1: sub_nodes
25 Next 0 11 0 1
26 Noop 0 0 0 0 End WHERE-loop0: edge
27 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1
28 Copy 1 6 0 0 r[6]=r[1]
29 ResultRow 6 1 0 0 output=r[6]
30 Halt 0 0 0 0
31 Transaction 0 0 7 0 1 usesStmtJournal=0
32 Goto 0 1 0 0
┌─────────────────────┐
│ count(edge._rowid_) │
├─────────────────────┤
│ 1929 │
└─────────────────────┘
Run Time: real 0.138 user 0.046875 sys 0.031250
sqlite>
(10) By mlin (dnamlin) on 2021-06-02 02:25:19 in reply to 9 [link] [source]
Keith: agreed. May I also suggest 10X'ing the problem size (N on line 8 of my script) from 100K to 1M. The result set size is ~20K instead of ~2K. I'm pretty sure I saw one or more of the query plans change in that case. Very interesting indeed! Mike
(11) By Richard Hipp (drh) on 2021-06-02 12:47:10 in reply to 1 [link] [source]
The following script demonstrates the problem. I am adding the script to this forum thread to save is as part of the historical record:
/* Adjust $N and $NSUB as necessary to increase/descrease the
** problem size. $NSUB was $N/10 in the original problem */
.param set $N 10000
.param set $NSUB 1000
CREATE TABLE node(node_id INTEGER PRIMARY KEY);
CREATE TABLE edge(node_from INT, node_to INT);
CREATE TABLE sub_nodes(node_id INTEGER PRIMARY KEY);
WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$N )
INSERT INTO node(node_id) SELECT NULL FROM s;
WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$N )
INSERT INTO edge(node_from, node_to) SELECT i, i+1 FROM s;
WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$N )
INSERT INTO edge(node_from, node_to)
SELECT i, abs((i*1103515245 + 12345) % $N) FROM s;
WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$NSUB )
REPLACE INTO sub_nodes(node_id) SELECT abs(i) FROM s;
CREATE INDEX edge_from_to ON edge(node_from,node_to);
CREATE INDEX edge_to_from ON edge(node_to,node_from);
ANALYZE;
.mode box
SELECT * FROM sqlite_stat1;
.stats vmstep
.eqp on
.echo on
SELECT count(*) FROM edge
WHERE node_from IN sub_nodes AND node_to IN sub_nodes;
SELECT count(*) FROM edge
WHERE +node_from IN sub_nodes AND node_to IN sub_nodes;
SELECT count(*) FROM edge NOT INDEXED
WHERE node_from IN sub_nodes AND node_to IN sub_nodes;
(12) By Richard Hipp (drh) on 2021-06-04 18:29:07 in reply to 1 [link] [source]
I believe this issue is now resolved on trunk, check-in 9a2ab6092d644fc3 and later. Please confirm.
(13) By mlin (dnamlin) on 2021-06-05 12:46:12 in reply to 12 [link] [source]
Yes, tested this version with both the repro script and the real app -- looks good! Thanks! The patch's effect seems to be to avoid using OP_SeekScan in the cases I looked at the bytecode for -- so just the caveat, I didn't directly observe the query running fast and using OP_SeekScan.
(14) By Richard Hipp (drh) on 2021-06-05 14:40:00 in reply to 13 [link] [source]
One of our clients has real-world cases that run slowly if OP_SeekScan is turned off. In fact, OP_SeekScan was added specifically to support that client. We have cases in the TH3 test suite that run more slowly when OP_SeekScan is disabled.
If you don't want to ever use OP_SeekScan, you can now disable it at run-time using the SQLITE_TESTCTRL_OPTIMIZATIONS test-control with a mask of 0x20000:
sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0x20000);
In the CLI use this dot-command:
.testctrl optimizations 0x20000
If you discover a test case that runs faster with OP_SeekScan disabled, please bring it to our attention by posting it on this forum.