Query performance regression after v3.33.0 (OP_SeekScan?)

(1) By mlin (dnamlin) on 2021-06-01 06:30:08 [source]

My application hit a severe performance regression upgrading SQLite to v3.35.4 from v3.33.0, I think possibly due to the new OP_SeekScan feature. I've distilled a reproduction in this gist:

https://gist.github.com/mlin/a5e9ef7ca0d6583b8397a0fb324e4327

Briefly, our database stores a graph, with nodes, edges connecting them, and indexes thereof. Then we get a 1-column temporary table providing a certain subset of node IDs, and we wish to query for those edges touching only nodes included in the subset:

SELECT count(_rowid_)
FROM edge
WHERE node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes

With the numerous constants in the Python script, SQLite v3.35.5 runs this query roughly 500X slower than v3.31.1. (My Rust application, on which the repro script is based, hit the problem after upgrading from v3.33.0 to v3.35.4)

The high-level "explain query plans" are identical between the two versions, but, the opcode-level "explain" results differ (shown in the gist), notably including the new SeekScan operation. I speculate perhaps the two "IN" clauses in our query lead to some kind of quadratic cost to the new scan attempt?

Thanks in advance for advice/attention.

(2) By mlin (dnamlin) on 2021-06-01 11:54:02 in reply to 1 [link] [source]

Brief PS: simply removing indexes from this example (deleting lines 40 & 41 from my repro script) makes the query hundreds of times faster using v3.35.5, and the plan doesn't use SeekScan in that case. It smells to me like the SeekScan scanning effort parameter (This.P1) might somehow be off by orders of magnitude in the circumstances I've created.

(3) By anonymous on 2021-06-01 13:54:47 in reply to 1 [link] [source]

Any reason not to use a three-way join?

    select count(*)
    from edge,
        temp.sub_nodes src on src.node_id=node_from,
        temp.sub_nodes dst on dst.node_id=node_to;

(4) By Richard Hipp (drh) on 2021-06-01 16:48:13 in reply to 1 [link] [source]

As a temporary work-around, please try modifying your query by putting a single "+" in from of the node_from in the WHERE clause:

SELECT count(_rowid_)
FROM edge
WHERE +node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes;
  --  ^-- extra "+" here

(5) By mlin (dnamlin) on 2021-06-01 20:51:46 in reply to 3 [link] [source]

Any reason not to use a three-way join?

Thanks for this suggestion! It does solve the problem in SQLite v3.35, but makes it ~equally as fast as the original version in SQLite v3.31, which suggests the query planner is, impressively, supposed to run my original (more concise and readable?) version in effectively the same way.

Original plus Richard's workaround:

EXPLAIN QUERY PLAN SELECT count(_rowid_) FROM edge
WHERE +node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes

SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR

Three-way join:

EXPLAIN QUERY PLAN SELECT count(edge._rowid_) FROM edge,
     temp.sub_nodes sub_from ON sub_from.node_id = node_from,
     temp.sub_nodes sub_to ON sub_to.node_id = node_to

SCAN TABLE sub_nodes AS sub_from
SEARCH TABLE edge USING COVERING INDEX edge_from_to (node_from=?)
SEARCH TABLE sub_nodes AS sub_to USING INTEGER PRIMARY KEY (rowid=?)

The runtimes of these two queries remain roughly equivalent with increasing problem size.

(6) By mlin (dnamlin) on 2021-06-01 20:55:12 in reply to 4 [link] [source]

Yes, this worked like a charm, both in my repro script and in the full application. With the hint, the bytecode no longer uses SeekScan -- one more piece of evidence pointing in that direction. Thank you!!

(7) By Keith Medcalf (kmedcalf) on 2021-06-01 22:34:26 in reply to 1 [link] [source]

Should not that query be exactly equal to:

SELECT count(_rowid_)
  FROM edge
 WHERE EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_from)
   AND EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_to)
;

since value IN table means where exists a row in table where the primary key of table == value?

(8) By mlin (dnamlin) on 2021-06-01 23:30:31 in reply to 7 [link] [source]

I can agree that these queries should produce the same exact result sets, but the query planner certainly seems to distinguish them. All with v3.35.5 and problem size N=1000000:

Mike's version plus Richard's hint (WHERE...IN)

EXPLAIN QUERY PLAN SELECT count(_rowid_) FROM edge
WHERE +node_from IN temp.sub_nodes AND node_to IN temp.sub_nodes

SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR

anonymous' three-way join

EXPLAIN QUERY PLAN SELECT count(edge._rowid_) FROM edge,
     temp.sub_nodes sub_from ON sub_from.node_id = node_from,
     temp.sub_nodes sub_to ON sub_to.node_id = node_to

SCAN TABLE sub_nodes AS sub_from
SEARCH TABLE edge USING COVERING INDEX edge_from_to (node_from=?)
SEARCH TABLE sub_nodes AS sub_to USING INTEGER PRIMARY KEY (rowid=?)

Keith's version plus Richard's hint (WHERE...EXISTS)

EXPLAIN QUERY PLAN SELECT count(_rowid_)
  FROM edge
 WHERE EXISTS (SELECT * FROM sub_nodes WHERE node_id == +node_from)
   AND EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_to)

SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
CORRELATED SCALAR SUBQUERY 1
    SEARCH TABLE sub_nodes USING INTEGER PRIMARY KEY (rowid=?)

Mike's or Keith's version without Richard's hint (disastrously slow, using SeekScan)

SEARCH TABLE edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR

Beauty is in the eye of the beholder, but my version is the most concise and most closely approximates what I'd say in English ("gimme the edges whose node_from and node_to are both in sub_nodes). And SQLite v3.33.0 did great with it even without the hint.

I'm certainly learning a ton from all this though, thanks!

(9) By Keith Medcalf (kmedcalf) on 2021-06-02 00:49:47 in reply to 8 [link] [source]

Using the current tip of trunk (my version of a8d921136f) I get the following results (this is after analyze being run).

Clearly the different spellings of the same request result in wildly different query plans and vdbe code being generated, the most efficient being the "direct join" where the planner has the most latitude to choose the nesting order.

The interesting thing is that the IN form should be no less efficient that the EXISTS form, but it is.

sqlite> SELECT count(_rowid_)
   ...>   FROM edge
   ...>  WHERE node_from IN sub_nodes
   ...>    AND node_to IN sub_nodes
   ...> ;
QUERY PLAN
|--SEARCH edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?) (~44 rows)
|--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
`--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     31    0                    0   Start at 31
1     Null           0     1     2                    0   r[1..2]=NULL
2     OpenRead       3     8     0     k(3,,,)        0   root=8 iDb=0; edge_to_from
3     ColumnsUsed    3     0     0     3              0
4     Explain        4     0     0     SEARCH edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?) (~44 rows)  0
5     Noop           0     0     0                    0   Begin WHERE-loop0: edge
6     Once           0     8     0                    0
7     OpenRead       4     4     0     1              0   root=4 iDb=0; sub_nodes
8     Rewind         4     26    0                    0
9       Rowid          4     3     0                    0   r[3]=rowid
10      IsNull         3     25    0                    0   if r[3]==NULL goto 25
11      Once           0     13    0                    0
12      OpenRead       5     4     0     1              0   root=4 iDb=0; sub_nodes
13      Rewind         5     25    0                    0
14        Rowid          5     4     0                    0   r[4]=rowid
15        IsNull         4     24    0                    0   if r[4]==NULL goto 24
16        SeekScan       18    19    0                    0   Scan-ahead up to 18 rows
17        SeekGE         3     24    3     2              0   key=r[3..4]
18          IdxGT          3     24    3     2              0   key=r[3..4]
19          Noop           0     0     0                    0   Begin WHERE-core
20          IdxRowid       3     5     0                    0   r[5]=rowid
21          AggStep        0     5     1     count(1)       1   accum=r[1] step(r[5])
22          Noop           0     0     0                    0   End WHERE-core
23        Next           3     18    0                    0
24      Next           5     14    0                    0
25    Next           4     9     0                    0
26    Noop           0     0     0                    0   End WHERE-loop0: edge
27    AggFinal       1     1     0     count(1)       0   accum=r[1] N=1
28    Copy           1     6     0                    0   r[6]=r[1]
29    ResultRow      6     1     0                    0   output=r[6]
30    Halt           0     0     0                    0
31    Transaction    0     0     7     0              1   usesStmtJournal=0
32    Goto           0     1     0                    0
┌────────────────┐
│ count(_rowid_) │
├────────────────┤
│ 1929           │
└────────────────┘
Run Time: real 5.817 user 5.703125 sys 0.031250
sqlite> SELECT count(_rowid_)
   ...>   FROM edge
   ...>  WHERE +node_from IN sub_nodes
   ...>    AND node_to IN sub_nodes
   ...> ;
QUERY PLAN
|--SEARCH edge USING COVERING INDEX edge_to_from (node_to=?) (~44 rows)
|--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
`--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     35    0                    0   Start at 35
1     Null           0     1     2                    0   r[1..2]=NULL
2     OpenRead       3     8     0     k(3,,,)        2   root=8 iDb=0; edge_to_from
3     ColumnsUsed    3     0     0     3              0
4     Explain        4     0     0     SEARCH edge USING COVERING INDEX edge_to_from (node_to=?) (~44 rows)  0
5     Noop           0     0     0                    0   Begin WHERE-loop0: edge
6     Once           0     8     0                    0
7     OpenRead       4     4     0     1              0   root=4 iDb=0; sub_nodes
8     Rewind         4     30    0                    0
9       Rowid          4     3     0                    0   r[3]=rowid
10      IsNull         3     29    0                    0   if r[3]==NULL goto 29
11      SeekGE         3     29    3     1              0   key=r[3]
12        IdxGT          3     29    3     1              0   key=r[3]
13        Noop           0     0     0                    0   begin IN expr
14        Once           0     16    0                    0
15        OpenRead       5     4     0     1              0   root=4 iDb=0; sub_nodes
16        Column         3     1     4                    0   r[4]=edge.node_from
17        SeekRowid      5     28    4                    0   intkey=r[4]
18        Goto           0     24    0                    0
19        Goto           0     28    0                    0
20        Rewind         5     28    0                    0
21        Column         5     0     5                    0   r[5]=
22        Ne             4     28    5     BINARY-8       0   if r[5]!=r[4] goto 28
23        Goto           0     28    0                    0   end IN expr
24        Noop           0     0     0                    0   Begin WHERE-core
25        IdxRowid       3     5     0                    0   r[5]=rowid
26        AggStep        0     5     1     count(1)       1   accum=r[1] step(r[5])
27        Noop           0     0     0                    0   End WHERE-core
28      Next           3     12    0                    0
29    Next           4     9     0                    0
30    Noop           0     0     0                    0   End WHERE-loop0: edge
31    AggFinal       1     1     0     count(1)       0   accum=r[1] N=1
32    Copy           1     6     0                    0   r[6]=r[1]
33    ResultRow      6     1     0                    0   output=r[6]
34    Halt           0     0     0                    0
35    Transaction    0     0     7     0              1   usesStmtJournal=0
36    Goto           0     1     0                    0
┌────────────────┐
│ count(_rowid_) │
├────────────────┤
│ 1929           │
└────────────────┘
Run Time: real 0.124 user 0.015625 sys 0.015625
sqlite> SELECT count(_rowid_)
   ...>   FROM edge
   ...>  WHERE +node_from IN sub_nodes
   ...>    AND +node_to IN sub_nodes
   ...> ;
QUERY PLAN
|--SCAN edge (~180224 rows)
|--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
`--USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     39    0                    0   Start at 39
1     Null           0     1     2                    0   r[1..2]=NULL
2     OpenRead       0     3     0     2              0   root=3 iDb=0; edge
3     ColumnsUsed    0     0     0     3              0
4     Explain        4     0     0     SCAN edge (~180224 rows)  0
5     Noop           0     0     0                    0   Begin WHERE-loop0: edge
6     Rewind         0     34    0                    0
7       Noop           0     0     0                    0   begin IN expr
8       Once           0     10    0                    0
9       OpenRead       3     4     0     1              0   root=4 iDb=0; sub_nodes
10      Column         0     0     3                    0   r[3]=edge.node_from
11      SeekRowid      3     33    3                    0   intkey=r[3]
12      Goto           0     18    0                    0
13      Goto           0     33    0                    0
14      Rewind         3     33    0                    0
15      Column         3     0     4                    0   r[4]=
16      Ne             3     33    4     BINARY-8       0   if r[4]!=r[3] goto 33
17      Goto           0     33    0                    0   end IN expr
18      Noop           0     0     0                    0   begin IN expr
19      Once           0     21    0                    0
20      OpenRead       4     4     0     1              0   root=4 iDb=0; sub_nodes
21      Column         0     1     4                    0   r[4]=edge.node_to
22      SeekRowid      4     33    4                    0   intkey=r[4]
23      Goto           0     29    0                    0
24      Goto           0     33    0                    0
25      Rewind         4     33    0                    0
26      Column         4     0     5                    0   r[5]=
27      Ne             4     33    5     BINARY-8       0   if r[5]!=r[4] goto 33
28      Goto           0     33    0                    0   end IN expr
29      Noop           0     0     0                    0   Begin WHERE-core
30      Rowid          0     5     0                    0   r[5]=rowid
31      AggStep        0     5     1     count(1)       1   accum=r[1] step(r[5])
32      Noop           0     0     0                    0   End WHERE-core
33    Next           0     7     0                    1
34    Noop           0     0     0                    0   End WHERE-loop0: edge
35    AggFinal       1     1     0     count(1)       0   accum=r[1] N=1
36    Copy           1     6     0                    0   r[6]=r[1]
37    ResultRow      6     1     0                    0   output=r[6]
38    Halt           0     0     0                    0
39    Transaction    0     0     7     0              1   usesStmtJournal=0
40    Goto           0     1     0                    0
┌────────────────┐
│ count(_rowid_) │
├────────────────┤
│ 1929           │
└────────────────┘
Run Time: real 0.151 user 0.031250 sys 0.046875
sqlite> SELECT count(_rowid_)
   ...>   FROM edge
   ...>  WHERE EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_from)
   ...>    AND EXISTS (SELECT * FROM sub_nodes WHERE node_id == node_to)
   ...> ;
QUERY PLAN
|--SCAN edge (~180224 rows)
|--CORRELATED SCALAR SUBQUERY 1
|  `--SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
`--CORRELATED SCALAR SUBQUERY 2
   `--SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     49    0                    0   Start at 49
1     Null           0     1     2                    0   r[1..2]=NULL
2     OpenRead       0     3     0     2              0   root=3 iDb=0; edge
3     ColumnsUsed    0     0     0     3              0
4     Explain        4     0     0     SCAN edge (~180224 rows)  0
5     Noop           0     0     0                    0   Begin WHERE-loop0: edge
6     Rewind         0     44    0                    0
7       Integer        21    4     0                    0   r[4]=21; return address
8       Integer        0     5     0                    0   r[5]=0; Init EXISTS result
9       Integer        1     6     0                    0   r[6]=1; LIMIT counter
10      OpenRead       1     4     0     0              0   root=4 iDb=0; sub_nodes
11      ColumnsUsed    1     0     0     0              0
12      Explain        12    0     0     SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row)  0
13      Noop           0     0     0                    0   Begin WHERE-loop0: sub_nodes
14      Column         0     0     7                    0   r[7]=edge.node_from
15      SeekRowid      1     20    7                    0   intkey=r[7]
16      Noop           0     0     0                    0   Begin WHERE-core
17      Integer        1     5     0                    0   r[5]=1
18      DecrJumpZero   6     21    0                    0   if (--r[6])==0 goto 21
19      Noop           0     0     0                    0   End WHERE-core
20      Noop           0     0     0                    0   End WHERE-loop0: sub_nodes
21      Return         4     0     0                    0
22      IfNot          5     43    1                    0
23      Integer        37    9     0                    0   r[9]=37; return address
24      Integer        0     10    0                    0   r[10]=0; Init EXISTS result
25      Integer        1     11    0                    0   r[11]=1; LIMIT counter
26      OpenRead       2     4     0     0              0   root=4 iDb=0; sub_nodes
27      ColumnsUsed    2     0     0     0              0
28      Explain        28    0     0     SEARCH sub_nodes USING INTEGER PRIMARY KEY (rowid=?) (~1 row)  0
29      Noop           0     0     0                    0   Begin WHERE-loop0: sub_nodes
30      Column         0     1     12                   0   r[12]=edge.node_to
31      SeekRowid      2     36    12                   0   intkey=r[12]
32      Noop           0     0     0                    0   Begin WHERE-core
33      Integer        1     10    0                    0   r[10]=1
34      DecrJumpZero   11    37    0                    0   if (--r[11])==0 goto 37
35      Noop           0     0     0                    0   End WHERE-core
36      Noop           0     0     0                    0   End WHERE-loop0: sub_nodes
37      Return         9     0     0                    0
38      IfNot          10    43    1                    0
39      Noop           0     0     0                    0   Begin WHERE-core
40      Rowid          0     3     0                    0   r[3]=rowid
41      AggStep        0     3     1     count(1)       1   accum=r[1] step(r[3])
42      Noop           0     0     0                    0   End WHERE-core
43    Next           0     7     0                    1
44    Noop           0     0     0                    0   End WHERE-loop0: edge
45    AggFinal       1     1     0     count(1)       0   accum=r[1] N=1
46    Copy           1     14    0                    0   r[14]=r[1]
47    ResultRow      14    1     0                    0   output=r[14]
48    Halt           0     0     0                    0
49    Transaction    0     0     7     0              1   usesStmtJournal=0
50    Goto           0     1     0                    0
┌────────────────┐
│ count(_rowid_) │
├────────────────┤
│ 1929           │
└────────────────┘
Run Time: real 0.200 user 0.078125 sys 0.046875
sqlite> SELECT count(edge._rowid_)
   ...>  FROM edge
   ...>  JOIN sub_nodes as s1 ON s1.node_id == node_from
   ...>  JOIN sub_nodes as s2 ON s2.node_id == node_to
   ...> ;
QUERY PLAN
|--SCAN s1 (~9216 rows)
|--SEARCH edge USING COVERING INDEX edge_from_to (node_from=?) (~2 rows)
`--SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     35    0                    0   Start at 35
1     Null           0     1     2                    0   r[1..2]=NULL
2     OpenRead       1     4     0     0              0   root=4 iDb=0; sub_nodes
3     ColumnsUsed    1     0     0     0              0
4     OpenRead       3     7     0     k(3,,,)        2   root=7 iDb=0; edge_from_to
5     ColumnsUsed    3     0     0     3              0
6     OpenRead       2     4     0     0              0   root=4 iDb=0; sub_nodes
7     ColumnsUsed    2     0     0     0              0
8     Explain        8     0     0     SCAN s1 (~9216 rows)  0
9     Noop           0     0     0                    0   Begin WHERE-loop0: sub_nodes
10    Rewind         1     30    0                    0
11      Explain        11    0     0     SEARCH edge USING COVERING INDEX edge_from_to (node_from=?) (~2 rows)  0
12      Noop           0     0     0                    0   Begin WHERE-loop1: edge
13      Rowid          1     3     0                    0   r[3]=rowid
14      CursorHint     3     0     0     EQ(r[3],c0)    0
15      Rowid          1     4     0                    0   r[4]=rowid
16      SeekGE         3     28    4     1              0   key=r[4]
17        IdxGT          3     28    4     1              0   key=r[4]
18        Explain        18    0     0     SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)  0
19        Noop           0     0     0                    0   Begin WHERE-loop2: sub_nodes
20        Column         3     1     5                    0   r[5]=edge.node_to
21        SeekRowid      2     26    5                    0   intkey=r[5]
22        Noop           0     0     0                    0   Begin WHERE-core
23        IdxRowid       3     6     0                    0   r[6]=rowid
24        AggStep        0     6     1     count(1)       1   accum=r[1] step(r[6])
25        Noop           0     0     0                    0   End WHERE-core
26        Noop           0     0     0                    0   End WHERE-loop2: sub_nodes
27      Next           3     17    0                    0
28      Noop           0     0     0                    0   End WHERE-loop1: edge
29    Next           1     11    0                    1
30    Noop           0     0     0                    0   End WHERE-loop0: sub_nodes
31    AggFinal       1     1     0     count(1)       0   accum=r[1] N=1
32    Copy           1     7     0                    0   r[7]=r[1]
33    ResultRow      7     1     0                    0   output=r[7]
34    Halt           0     0     0                    0
35    Transaction    0     0     7     0              1   usesStmtJournal=0
36    Goto           0     1     0                    0
┌─────────────────────┐
│ count(edge._rowid_) │
├─────────────────────┤
│ 1929                │
└─────────────────────┘
Run Time: real 0.114 user 0.015625 sys 0.000000
sqlite> SELECT count(edge._rowid_)
   ...>  FROM edge
   ...>  JOIN sub_nodes as s1 ON s1.node_id == +node_from
   ...>  JOIN sub_nodes as s2 ON s2.node_id == +node_to
   ...> ;
QUERY PLAN
|--SCAN edge (~196608 rows)
|--SEARCH s1 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
`--SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     31    0                    0   Start at 31
1     Null           0     1     2                    0   r[1..2]=NULL
2     OpenRead       0     3     0     2              0   root=3 iDb=0; edge
3     ColumnsUsed    0     0     0     3              0
4     OpenRead       1     4     0     0              0   root=4 iDb=0; sub_nodes
5     ColumnsUsed    1     0     0     0              0
6     OpenRead       2     4     0     0              0   root=4 iDb=0; sub_nodes
7     ColumnsUsed    2     0     0     0              0
8     Explain        8     0     0     SCAN edge (~196608 rows)  0
9     Noop           0     0     0                    0   Begin WHERE-loop0: edge
10    Rewind         0     26    0                    0
11      Explain        11    0     0     SEARCH s1 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)  0
12      Noop           0     0     0                    0   Begin WHERE-loop1: sub_nodes
13      Column         0     0     3                    0   r[3]=edge.node_from
14      SeekRowid      1     24    3                    0   intkey=r[3]
15      Explain        15    0     0     SEARCH s2 USING INTEGER PRIMARY KEY (rowid=?) (~1 row)  0
16      Noop           0     0     0                    0   Begin WHERE-loop2: sub_nodes
17      Column         0     1     4                    0   r[4]=edge.node_to
18      SeekRowid      2     23    4                    0   intkey=r[4]
19      Noop           0     0     0                    0   Begin WHERE-core
20      Rowid          0     5     0                    0   r[5]=rowid
21      AggStep        0     5     1     count(1)       1   accum=r[1] step(r[5])
22      Noop           0     0     0                    0   End WHERE-core
23      Noop           0     0     0                    0   End WHERE-loop2: sub_nodes
24      Noop           0     0     0                    0   End WHERE-loop1: sub_nodes
25    Next           0     11    0                    1
26    Noop           0     0     0                    0   End WHERE-loop0: edge
27    AggFinal       1     1     0     count(1)       0   accum=r[1] N=1
28    Copy           1     6     0                    0   r[6]=r[1]
29    ResultRow      6     1     0                    0   output=r[6]
30    Halt           0     0     0                    0
31    Transaction    0     0     7     0              1   usesStmtJournal=0
32    Goto           0     1     0                    0
┌─────────────────────┐
│ count(edge._rowid_) │
├─────────────────────┤
│ 1929                │
└─────────────────────┘
Run Time: real 0.138 user 0.046875 sys 0.031250
sqlite>

(10) By mlin (dnamlin) on 2021-06-02 02:25:19 in reply to 9 [link] [source]

Keith: agreed. May I also suggest 10X'ing the problem size (N on line 8 of my script) from 100K to 1M. The result set size is ~20K instead of ~2K. I'm pretty sure I saw one or more of the query plans change in that case. Very interesting indeed! Mike

(11) By Richard Hipp (drh) on 2021-06-02 12:47:10 in reply to 1 [link] [source]

The following script demonstrates the problem. I am adding the script to this forum thread to save is as part of the historical record:

/* Adjust $N and $NSUB as necessary to increase/descrease the
** problem size.  $NSUB was $N/10 in the original problem */
.param set $N 10000
.param set $NSUB 1000
CREATE TABLE node(node_id INTEGER PRIMARY KEY);
CREATE TABLE edge(node_from INT, node_to INT);
CREATE TABLE sub_nodes(node_id INTEGER PRIMARY KEY);
WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$N )
  INSERT INTO node(node_id) SELECT NULL FROM s;
WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$N )
  INSERT INTO edge(node_from, node_to) SELECT i, i+1 FROM s;
WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$N )
  INSERT INTO edge(node_from, node_to)
    SELECT i, abs((i*1103515245 + 12345) % $N) FROM s;
WITH s(i) AS ( SELECT 1 UNION ALL SELECT i+1 FROM s WHERE i<$NSUB )
  REPLACE INTO sub_nodes(node_id) SELECT abs(i) FROM s;
CREATE INDEX edge_from_to ON edge(node_from,node_to);
CREATE INDEX edge_to_from ON edge(node_to,node_from);
ANALYZE;

.mode box
SELECT * FROM sqlite_stat1;
.stats vmstep
.eqp on
.echo on

SELECT count(*) FROM edge 
 WHERE node_from IN sub_nodes AND node_to IN sub_nodes;

SELECT count(*) FROM edge 
 WHERE +node_from IN sub_nodes AND node_to IN sub_nodes;

SELECT count(*) FROM edge NOT INDEXED
 WHERE node_from IN sub_nodes AND node_to IN sub_nodes;

(12) By Richard Hipp (drh) on 2021-06-04 18:29:07 in reply to 1 [link] [source]

I believe this issue is now resolved on trunk, check-in 9a2ab6092d644fc3 and later. Please confirm.

(13) By mlin (dnamlin) on 2021-06-05 12:46:12 in reply to 12 [link] [source]

Yes, tested this version with both the repro script and the real app -- looks good! Thanks! The patch's effect seems to be to avoid using OP_SeekScan in the cases I looked at the bytecode for -- so just the caveat, I didn't directly observe the query running fast and using OP_SeekScan.

(14) By Richard Hipp (drh) on 2021-06-05 14:40:00 in reply to 13 [link] [source]

One of our clients has real-world cases that run slowly if OP_SeekScan is turned off. In fact, OP_SeekScan was added specifically to support that client. We have cases in the TH3 test suite that run more slowly when OP_SeekScan is disabled.

If you don't want to ever use OP_SeekScan, you can now disable it at run-time using the SQLITE_TESTCTRL_OPTIMIZATIONS test-control with a mask of 0x20000:

 sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0x20000);

In the CLI use this dot-command:

.testctrl optimizations 0x20000

If you discover a test case that runs faster with OP_SeekScan disabled, please bring it to our attention by posting it on this forum.