Query performance regression after v3.33.0 (OP_SeekScan?)
(1) By mlin (dnamlin) on 2021-06-01 06:30:08 [link]
My application hit a severe performance regression upgrading SQLite to v3.35.4 from v3.33.0, I think possibly due to the new OP_SeekScan feature. I've distilled a reproduction in this gist: https://gist.github.com/mlin/a5e9ef7ca0d6583b8397a0fb324e4327 Briefly, our database stores a graph, with nodes, edges connecting them, and indexes thereof. Then we get a 1-column temporary table providing a certain subset of node IDs, and we wish to query for those edges touching only nodes included in the subset: ``` SELECT count(_rowid_) FROM edge WHERE node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes ``` With the numerous constants in the Python script, SQLite v3.35.5 runs this query roughly 500X slower than v3.31.1. (My [Rust application](https://github.com/mlin/gfabase), on which the repro script is based, hit the problem after upgrading from v3.33.0 to v3.35.4) The high-level "explain query plans" are identical between the two versions, but, the opcode-level "explain" results differ (shown in the gist), notably including the new SeekScan operation. I speculate perhaps the *two* "IN" clauses in our query lead to some kind of quadratic cost to the new scan attempt? Thanks in advance for advice/attention.
(2) By mlin (dnamlin) on 2021-06-01 11:54:02 in reply to 1 [link]
Brief PS: simply removing indexes from this example (deleting lines 40 & 41 from my repro script) makes the query hundreds of times faster using v3.35.5, and the plan doesn't use SeekScan in that case. It smells to me like the SeekScan scanning effort parameter (This.P1) might somehow be off by orders of magnitude in the circumstances I've created.
(3) By anonymous on 2021-06-01 13:54:47 in reply to 1 [link]
Any reason not to use a three-way join? <pre> select count(*) from edge, temp.sub_nodes src on src.node_id=node_from, temp.sub_nodes dst on dst.node_id=node_to; </pre>
(4) By Richard Hipp (drh) on 2021-06-01 16:48:13 in reply to 1 [link]
As a temporary work-around, please try modifying your query by putting a single "+" in from of the node_from in the WHERE clause: > ~~~ SELECT count(_rowid_) FROM edge WHERE +node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes; -- ^-- extra "+" here ~~~
(5) By mlin (dnamlin) on 2021-06-01 20:51:46 in reply to 3 [link]
> Any reason not to use a three-way join? Thanks for this suggestion! It does solve the problem in SQLite v3.35, but makes it ~equally as fast as the original version in SQLite v3.31, which suggests the query planner is, impressively, supposed to run my original (more concise and readable?) version in effectively the same way. Original plus Richard's workaround: > ``` EXPLAIN QUERY PLAN SELECT count(_rowid_) FROM edge WHERE +node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes``` ``` SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=?) USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR ``` Three-way join: > ``` EXPLAIN QUERY PLAN SELECT count(edge._rowid_) FROM edge, temp.sub_nodes sub_from ON sub_from.node_id = node_from, temp.sub_nodes sub_to ON sub_to.node_id = node_to ``` ``` SCAN TABLE sub_nodes AS sub_from SEARCH TABLE edge USING COVERING INDEX edge_from_to (node_from=?) SEARCH TABLE sub_nodes AS sub_to USING INTEGER PRIMARY KEY (rowid=?) ``` The runtimes of these two queries remain roughly equivalent with increasing problem size.
(6) By mlin (dnamlin) on 2021-06-01 20:55:12 in reply to 4 [link]
Yes, this worked like a charm, both in my repro script and [in the full application](https://github.com/mlin/gfabase/actions/runs/897285548). With the hint, the bytecode no longer uses SeekScan -- one more piece of evidence pointing in that direction. Thank you!!
(7) By Keith Medcalf (kmedcalf) on 2021-06-01 22:34:26 in reply to 1 [link]
Should not that query be exactly equal to: ``` SELECT count(_rowid_) FROM edge WHERE EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_from) AND EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_to) ; ``` since `value IN table` means where exists a row in table where the primary key of table == value?
(8) By mlin (dnamlin) on 2021-06-01 23:30:31 in reply to 7 [link]
I can agree that these queries should produce the same exact result sets, but the query planner certainly seems to distinguish them. All with v3.35.5 and problem size N=1000000: - Mike's version plus Richard's hint (WHERE...IN) ``` EXPLAIN QUERY PLAN SELECT count(_rowid_) FROM edge WHERE +node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=?) USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR ``` - anonymous' three-way join ``` EXPLAIN QUERY PLAN SELECT count(edge._rowid_) FROM edge, temp.sub_nodes sub_from ON sub_from.node_id = node_from, temp.sub_nodes sub_to ON sub_to.node_id = node_to SCAN TABLE sub_nodes AS sub_from SEARCH TABLE edge USING COVERING INDEX edge_from_to (node_from=?) SEARCH TABLE sub_nodes AS sub_to USING INTEGER PRIMARY KEY (rowid=?) ``` - Keith's version plus Richard's hint (WHERE...EXISTS) ``` EXPLAIN QUERY PLAN SELECT count(_rowid_) FROM edge WHERE EXISTS (SELECT * FROM sub_nodes WHERE node_id == +node_from) AND EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_to) SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=?) USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR CORRELATED SCALAR SUBQUERY 1 SEARCH TABLE sub_nodes USING INTEGER PRIMARY KEY (rowid=?) ``` - Mike's or Keith's version *without* Richard's hint (disastrously slow, using SeekScan) ``` SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?) USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR ``` -- Beauty is in the eye of the beholder, but my version is the most concise and most closely approximates what I'd say in English ("gimme the edges whose node_from and node_to are both in sub_nodes). And SQLite v3.33.0 did great with it even without the hint. I'm certainly learning a ton from all this though, thanks!
(9) By Keith Medcalf (kmedcalf) on 2021-06-02 00:49:47 in reply to 8 [link]
Using the current tip of trunk (my version of a8d921136f) I get the following results (this is after analyze being run). Clearly the different spellings of the same request result in wildly different query plans and vdbe code being generated, the most efficient being the "direct join" where the planner has the most latitude to choose the nesting order. The interesting thing is that the IN form should be no less efficient that the EXISTS form, but it is. ``` sqlite> SELECT count(_rowid_) ...> FROM edge ...> WHERE node_from IN sub_nodes ...> AND node_to IN sub_nodes ...> ; QUERY PLAN |--SEARCH edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?) (~44 rows) |--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR `--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 31 0 0 Start at 31 1 Null 0 1 2 0 r[1..2]=NULL 2 OpenRead 3 8 0 k(3,,,) 0 root=8 iDb=0; edge_to_from 3 ColumnsUsed 3 0 0 3 0 4 Explain 4 0 0 SEARCH edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?) (~44 rows) 0 5 Noop 0 0 0 0 Begin WHERE-loop0: edge 6 Once 0 8 0 0 7 OpenRead 4 4 0 1 0 root=4 iDb=0; sub_nodes 8 Rewind 4 26 0 0 9 Rowid 4 3 0 0 r[3]=rowid 10 IsNull 3 25 0 0 if r[3]==NULL goto 25 11 Once 0 13 0 0 12 OpenRead 5 4 0 1 0 root=4 iDb=0; sub_nodes 13 Rewind 5 25 0 0 14 Rowid 5 4 0 0 r[4]=rowid 15 IsNull 4 24 0 0 if r[4]==NULL goto 24 16 SeekScan 18 19 0 0 Scan-ahead up to 18 rows 17 SeekGE 3 24 3 2 0 key=r[3..4] 18 IdxGT 3 24 3 2 0 key=r[3..4] 19 Noop 0 0 0 0 Begin WHERE-core 20 IdxRowid 3 5 0 0 r[5]=rowid 21 AggStep 0 5 1 count(1) 1 accum=r[1] step(r[5]) 22 Noop 0 0 0 0 End WHERE-core 23 Next 3 18 0 0 24 Next 5 14 0 0 25 Next 4 9 0 0 26 Noop 0 0 0 0 End WHERE-loop0: edge 27 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1 28 Copy 1 6 0 0 r[6]=r[1] 29 ResultRow 6 1 0 0 output=r[6] 30 Halt 0 0 0 0 31 Transaction 0 0 7 0 1 usesStmtJournal=0 32 Goto 0 1 0 0 ┌────────────────┐ │ count(_rowid_) │ ├────────────────┤ │ 1929 │ └────────────────┘ Run Time: real 5.817 user 5.703125 sys 0.031250 sqlite> SELECT count(_rowid_) ...> FROM edge ...> WHERE +node_from IN sub_nodes ...> AND node_to IN sub_nodes ...> ; QUERY PLAN |--SEARCH edge USING COVERING INDEX edge_to_from (node_to=?) (~44 rows) |--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR `--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 35 0 0 Start at 35 1 Null 0 1 2 0 r[1..2]=NULL 2 OpenRead 3 8 0 k(3,,,) 2 root=8 iDb=0; edge_to_from 3 ColumnsUsed 3 0 0 3 0 4 Explain 4 0 0 SEARCH edge USING COVERING INDEX edge_to_from (node_to=?) (~44 rows) 0 5 Noop 0 0 0 0 Begin WHERE-loop0: edge 6 Once 0 8 0 0 7 OpenRead 4 4 0 1 0 root=4 iDb=0; sub_nodes 8 Rewind 4 30 0 0 9 Rowid 4 3 0 0 r[3]=rowid 10 IsNull 3 29 0 0 if r[3]==NULL goto 29 11 SeekGE 3 29 3 1 0 key=r[3] 12 IdxGT 3 29 3 1 0 key=r[3] 13 Noop 0 0 0 0 begin IN expr 14 Once 0 16 0 0 15 OpenRead 5 4 0 1 0 root=4 iDb=0; sub_nodes 16 Column 3 1 4 0 r[4]=edge.node_from 17 SeekRowid 5 28 4 0 intkey=r[4] 18 Goto 0 24 0 0 19 Goto 0 28 0 0 20 Rewind 5 28 0 0 21 Column 5 0 5 0 r[5]= 22 Ne 4 28 5 BINARY-8 0 if r[5]!=r[4] goto 28 23 Goto 0 28 0 0 end IN expr 24 Noop 0 0 0 0 Begin WHERE-core 25 IdxRowid 3 5 0 0 r[5]=rowid 26 AggStep 0 5 1 count(1) 1 accum=r[1] step(r[5]) 27 Noop 0 0 0 0 End WHERE-core 28 Next 3 12 0 0 29 Next 4 9 0 0 30 Noop 0 0 0 0 End WHERE-loop0: edge 31 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1 32 Copy 1 6 0 0 r[6]=r[1] 33 ResultRow 6 1 0 0 output=r[6] 34 Halt 0 0 0 0 35 Transaction 0 0 7 0 1 usesStmtJournal=0 36 Goto 0 1 0 0 ┌────────────────┐ │ count(_rowid_) │ ├────────────────┤ │ 1929 │ └────────────────┘ Run Time: real 0.124 user 0.015625 sys 0.015625 sqlite> SELECT count(_rowid_) ...> FROM edge ...> WHERE +node_from IN sub_nodes ...> AND +node_to IN sub_nodes ...> ; QUERY PLAN |--SCAN edge (~180224 rows) |--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR `--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 39 0 0 Start at 39 1 Null 0 1 2 0 r[1..2]=NULL 2 OpenRead 0 3 0 2 0 root=3 iDb=0; edge 3 ColumnsUsed 0 0 0 3 0 4 Explain 4 0 0 SCAN edge (~180224 rows) 0 5 Noop 0 0 0 0 Begin WHERE-loop0: edge 6 Rewind 0 34 0 0 7 Noop 0 0 0 0 begin IN expr 8 Once 0 10 0 0 9 OpenRead 3 4 0 1 0 root=4 iDb=0; sub_nodes 10 Column 0 0 3 0 r[3]=edge.node_from 11 SeekRowid 3 33 3 0 intkey=r[3] 12 Goto 0 18 0 0 13 Goto 0 33 0 0 14 Rewind 3 33 0 0 15 Column 3 0 4 0 r[4]= 16 Ne 3 33 4 BINARY-8 0 if r[4]!=r[3] goto 33 17 Goto 0 33 0 0 end IN expr 18 Noop 0 0 0 0 begin IN expr 19 Once 0 21 0 0 20 OpenRead 4 4 0 1 0 root=4 iDb=0; sub_nodes 21 Column 0 1 4 0 r[4]=edge.node_to 22 SeekRowid 4 33 4 0 intkey=r[4] 23 Goto 0 29 0 0 24 Goto 0 33 0 0 25 Rewind 4 33 0 0 26 Column 4 0 5 0 r[5]= 27 Ne 4 33 5 BINARY-8 0 if r[5]!=r[4] goto 33 28 Goto 0 33 0 0 end IN expr 29 Noop 0 0 0 0 Begin WHERE-core 30 Rowid 0 5 0 0 r[5]=rowid 31 AggStep 0 5 1 count(1) 1 accum=r[1] step(r[5]) 32 Noop 0 0 0 0 End WHERE-core 33 Next 0 7 0 1 34 Noop 0 0 0 0 End WHERE-loop0: edge 35 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1 36 Copy 1 6 0 0 r[6]=r[1] 37 ResultRow 6 1 0 0 output=r[6] 38 Halt 0 0 0 0 39 Transaction 0 0 7 0 1 usesStmtJournal=0 40 Goto 0 1 0 0 ┌────────────────┐ │ count(_rowid_) │ ├────────────────┤ │ 1929 │ └────────────────┘ Run Time: real 0.151 user 0.031250 sys 0.046875 sqlite> SELECT count(_rowid_) ...> FROM edge ...> WHERE EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_from) ...> AND EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_to) ...> ; QUERY PLAN |--SCAN edge (~180224 rows) |--CORRELATED SCALAR SUBQUERY 1 | `--SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row) `--CORRELATED SCALAR SUBQUERY 2 `--SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row) addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 49 0 0 Start at 49 1 Null 0 1 2 0 r[1..2]=NULL 2 OpenRead 0 3 0 2 0 root=3 iDb=0; edge 3 ColumnsUsed 0 0 0 3 0 4 Explain 4 0 0 SCAN edge (~180224 rows) 0 5 Noop 0 0 0 0 Begin WHERE-loop0: edge 6 Rewind 0 44 0 0 7 Integer 21 4 0 0 r[4]=21; return address 8 Integer 0 5 0 0 r[5]=0; Init EXISTS result 9 Integer 1 6 0 0 r[6]=1; LIMIT counter 10 OpenRead 1 4 0 0 0 root=4 iDb=0; sub_nodes 11 ColumnsUsed 1 0 0 0 0 12 Explain 12 0 0 SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0 13 Noop 0 0 0 0 Begin WHERE-loop0: sub_nodes 14 Column 0 0 7 0 r[7]=edge.node_from 15 SeekRowid 1 20 7 0 intkey=r[7] 16 Noop 0 0 0 0 Begin WHERE-core 17 Integer 1 5 0 0 r[5]=1 18 DecrJumpZero 6 21 0 0 if (--r[6])==0 goto 21 19 Noop 0 0 0 0 End WHERE-core 20 Noop 0 0 0 0 End WHERE-loop0: sub_nodes 21 Return 4 0 0 0 22 IfNot 5 43 1 0 23 Integer 37 9 0 0 r[9]=37; return address 24 Integer 0 10 0 0 r[10]=0; Init EXISTS result 25 Integer 1 11 0 0 r[11]=1; LIMIT counter 26 OpenRead 2 4 0 0 0 root=4 iDb=0; sub_nodes 27 ColumnsUsed 2 0 0 0 0 28 Explain 28 0 0 SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0 29 Noop 0 0 0 0 Begin WHERE-loop0: sub_nodes 30 Column 0 1 12 0 r[12]=edge.node_to 31 SeekRowid 2 36 12 0 intkey=r[12] 32 Noop 0 0 0 0 Begin WHERE-core 33 Integer 1 10 0 0 r[10]=1 34 DecrJumpZero 11 37 0 0 if (--r[11])==0 goto 37 35 Noop 0 0 0 0 End WHERE-core 36 Noop 0 0 0 0 End WHERE-loop0: sub_nodes 37 Return 9 0 0 0 38 IfNot 10 43 1 0 39 Noop 0 0 0 0 Begin WHERE-core 40 Rowid 0 3 0 0 r[3]=rowid 41 AggStep 0 3 1 count(1) 1 accum=r[1] step(r[3]) 42 Noop 0 0 0 0 End WHERE-core 43 Next 0 7 0 1 44 Noop 0 0 0 0 End WHERE-loop0: edge 45 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1 46 Copy 1 14 0 0 r[14]=r[1] 47 ResultRow 14 1 0 0 output=r[14] 48 Halt 0 0 0 0 49 Transaction 0 0 7 0 1 usesStmtJournal=0 50 Goto 0 1 0 0 ┌────────────────┐ │ count(_rowid_) │ ├────────────────┤ │ 1929 │ └────────────────┘ Run Time: real 0.200 user 0.078125 sys 0.046875 sqlite> SELECT count(edge._rowid_) ...> FROM edge ...> JOIN sub_nodes as s1 ON s1.node_id == node_from ...> JOIN sub_nodes as s2 ON s2.node_id == node_to ...> ; QUERY PLAN |--SCAN s1 (~9216 rows) |--SEARCH edge USING COVERING INDEX edge_from_to (node_from=?) (~2 rows) `--SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row) addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 35 0 0 Start at 35 1 Null 0 1 2 0 r[1..2]=NULL 2 OpenRead 1 4 0 0 0 root=4 iDb=0; sub_nodes 3 ColumnsUsed 1 0 0 0 0 4 OpenRead 3 7 0 k(3,,,) 2 root=7 iDb=0; edge_from_to 5 ColumnsUsed 3 0 0 3 0 6 OpenRead 2 4 0 0 0 root=4 iDb=0; sub_nodes 7 ColumnsUsed 2 0 0 0 0 8 Explain 8 0 0 SCAN s1 (~9216 rows) 0 9 Noop 0 0 0 0 Begin WHERE-loop0: sub_nodes 10 Rewind 1 30 0 0 11 Explain 11 0 0 SEARCH edge USING COVERING INDEX edge_from_to (node_from=?) (~2 rows) 0 12 Noop 0 0 0 0 Begin WHERE-loop1: edge 13 Rowid 1 3 0 0 r[3]=rowid 14 CursorHint 3 0 0 EQ(r[3],c0) 0 15 Rowid 1 4 0 0 r[4]=rowid 16 SeekGE 3 28 4 1 0 key=r[4] 17 IdxGT 3 28 4 1 0 key=r[4] 18 Explain 18 0 0 SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0 19 Noop 0 0 0 0 Begin WHERE-loop2: sub_nodes 20 Column 3 1 5 0 r[5]=edge.node_to 21 SeekRowid 2 26 5 0 intkey=r[5] 22 Noop 0 0 0 0 Begin WHERE-core 23 IdxRowid 3 6 0 0 r[6]=rowid 24 AggStep 0 6 1 count(1) 1 accum=r[1] step(r[6]) 25 Noop 0 0 0 0 End WHERE-core 26 Noop 0 0 0 0 End WHERE-loop2: sub_nodes 27 Next 3 17 0 0 28 Noop 0 0 0 0 End WHERE-loop1: edge 29 Next 1 11 0 1 30 Noop 0 0 0 0 End WHERE-loop0: sub_nodes 31 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1 32 Copy 1 7 0 0 r[7]=r[1] 33 ResultRow 7 1 0 0 output=r[7] 34 Halt 0 0 0 0 35 Transaction 0 0 7 0 1 usesStmtJournal=0 36 Goto 0 1 0 0 ┌─────────────────────┐ │ count(edge._rowid_) │ ├─────────────────────┤ │ 1929 │ └─────────────────────┘ Run Time: real 0.114 user 0.015625 sys 0.000000 sqlite> SELECT count(edge._rowid_) ...> FROM edge ...> JOIN sub_nodes as s1 ON s1.node_id == +node_from ...> JOIN sub_nodes as s2 ON s2.node_id == +node_to ...> ; QUERY PLAN |--SCAN edge (~196608 rows) |--SEARCH s1 USING INTEGER PRIMARY KEY (rowid=?) (~1 row) `--SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row) addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 31 0 0 Start at 31 1 Null 0 1 2 0 r[1..2]=NULL 2 OpenRead 0 3 0 2 0 root=3 iDb=0; edge 3 ColumnsUsed 0 0 0 3 0 4 OpenRead 1 4 0 0 0 root=4 iDb=0; sub_nodes 5 ColumnsUsed 1 0 0 0 0 6 OpenRead 2 4 0 0 0 root=4 iDb=0; sub_nodes 7 ColumnsUsed 2 0 0 0 0 8 Explain 8 0 0 SCAN edge (~196608 rows) 0 9 Noop 0 0 0 0 Begin WHERE-loop0: edge 10 Rewind 0 26 0 0 11 Explain 11 0 0 SEARCH s1 USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0 12 Noop 0 0 0 0 Begin WHERE-loop1: sub_nodes 13 Column 0 0 3 0 r[3]=edge.node_from 14 SeekRowid 1 24 3 0 intkey=r[3] 15 Explain 15 0 0 SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row) 0 16 Noop 0 0 0 0 Begin WHERE-loop2: sub_nodes 17 Column 0 1 4 0 r[4]=edge.node_to 18 SeekRowid 2 23 4 0 intkey=r[4] 19 Noop 0 0 0 0 Begin WHERE-core 20 Rowid 0 5 0 0 r[5]=rowid 21 AggStep 0 5 1 count(1) 1 accum=r[1] step(r[5]) 22 Noop 0 0 0 0 End WHERE-core 23 Noop 0 0 0 0 End WHERE-loop2: sub_nodes 24 Noop 0 0 0 0 End WHERE-loop1: sub_nodes 25 Next 0 11 0 1 26 Noop 0 0 0 0 End WHERE-loop0: edge 27 AggFinal 1 1 0 count(1) 0 accum=r[1] N=1 28 Copy 1 6 0 0 r[6]=r[1] 29 ResultRow 6 1 0 0 output=r[6] 30 Halt 0 0 0 0 31 Transaction 0 0 7 0 1 usesStmtJournal=0 32 Goto 0 1 0 0 ┌─────────────────────┐ │ count(edge._rowid_) │ ├─────────────────────┤ │ 1929 │ └─────────────────────┘ Run Time: real 0.138 user 0.046875 sys 0.031250 sqlite> ```
(10) By mlin (dnamlin) on 2021-06-02 02:25:19 in reply to 9 [link]
Keith: agreed. May I also suggest 10X'ing the problem size (N on line 8 of my script) from 100K to 1M. The result set size is ~20K instead of ~2K. I'm pretty sure I saw one or more of the query plans change in that case. Very interesting indeed! Mike
(11) By Richard Hipp (drh) on 2021-06-02 12:47:10 in reply to 1 [link]
The following script demonstrates the problem. I am adding the script to this forum thread to save is as part of the historical record: ~~~ /* Adjust $N and $NSUB as necessary to increase/descrease the ** problem size. $NSUB was $N/10 in the original problem */ .param set $N 10000 .param set $NSUB 1000 CREATE TABLE node(node_id INTEGER PRIMARY KEY); CREATE TABLE edge(node_from INT, node_to INT); CREATE TABLE sub_nodes(node_id INTEGER PRIMARY KEY); WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$N ) INSERT INTO node(node_id) SELECT NULL FROM s; WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$N ) INSERT INTO edge(node_from, node_to) SELECT i, i+1 FROM s; WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$N ) INSERT INTO edge(node_from, node_to) SELECT i, abs((i*1103515245 + 12345) % $N) FROM s; WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$NSUB ) REPLACE INTO sub_nodes(node_id) SELECT abs(i) FROM s; CREATE INDEX edge_from_to ON edge(node_from,node_to); CREATE INDEX edge_to_from ON edge(node_to,node_from); ANALYZE; .mode box SELECT * FROM sqlite_stat1; .stats vmstep .eqp on .echo on SELECT count(*) FROM edge WHERE node_from IN sub_nodes AND node_to IN sub_nodes; SELECT count(*) FROM edge WHERE +node_from IN sub_nodes AND node_to IN sub_nodes; SELECT count(*) FROM edge NOT INDEXED WHERE node_from IN sub_nodes AND node_to IN sub_nodes; ~~~
(12) By Richard Hipp (drh) on 2021-06-04 18:29:07 in reply to 1 [link]
I believe this issue is now resolved on trunk, [check-in 9a2ab6092d644fc3][1] and later. Please confirm. [1]: src:/timeline?c=9a2ab6092d644fc3
(13) By mlin (dnamlin) on 2021-06-05 12:46:12 in reply to 12 [link]
Yes, tested this version with both the repro script and the real app -- looks good! Thanks! The patch's effect seems to be to avoid using OP_SeekScan in the cases I looked at the bytecode for -- so just the caveat, I didn't directly observe the query running fast *and* using OP_SeekScan.
(14) By Richard Hipp (drh) on 2021-06-05 14:40:00 in reply to 13
One of our clients has real-world cases that run slowly if OP_SeekScan is turned off. In fact, OP_SeekScan was added specifically to support that client. We have cases in the TH3 test suite that run more slowly when OP_SeekScan is disabled. If you don't want to ever use OP_SeekScan, you can now disable it at run-time using the [SQLITE_TESTCTRL_OPTIMIZATIONS][2] [test-control][1] with a mask of 0x20000: > sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0x20000); In the CLI use this dot-command: > .testctrl optimizations 0x20000 If you discover a test case that runs faster with OP_SeekScan disabled, please bring it to our attention by posting it on this forum. [1]: https://www.sqlite.org/c3ref/test_control.html [2]: https://www.sqlite.org/c3ref/c_testctrl_always.html