SQLite User Forum

Covering index regression with virtual columns (3.34.0)
Login

Covering index regression with virtual columns (3.34.0)

(1) By mlin (dnamlin) on 2022-12-21 01:38:54 [source]

Aloha all,

I'm seeing an apparent regression in a plan using a covering index in SQLite 3.34.0 compared to 3.39.3 and earlier versions. The schema is designed for interval overlap queries, like R-Tree but simplified in one dimension, and includes an index on virtual columns defined as simple expressions on normal columns (trivial in the following sufficient example). I see a few potentially-relevant entries in the release history for 3.34.0; appreciate if somebody can take a look. Thank you, happy holidays!

./sqlite3 :memory: "
CREATE TABLE t(
    beg INTEGER NOT NULL, end INTEGER NOT NULL,
    _beg INTEGER AS (beg) VIRTUAL, _len INTEGER AS (end-beg) VIRTUAL
);
CREATE INDEX t_i ON t(_beg, _len);
EXPLAIN SELECT _rowid_ FROM t INDEXED BY t_i WHERE _beg BETWEEN 100 AND 200 AND _beg+_len >= 150;
"

The index t_i should cover the query. With SQLite 3.39.3 or earlier, the plan is as desired, where the Column instructions (addrs 8 & 9) both read from the index t_i (cursor 1), not the table t (cursor 0).

addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   
1     OpenRead       0     2     0     2              0   
2     OpenRead       1     3     0     k(3,,,)        0   
3     Integer        100   1     0                    0   
4     SeekGE         1     15    1     1              0   
5     Integer        200   1     0                    0   
6       IdxGT          1     15    1     1              0   
7       DeferredSeek   1     0     0                    0   
8       Column         1     0     3                    0   
9       Column         1     1     4                    0   
10      Add            4     3     2                    0   
11      Lt             5     14    2                    80  
12      IdxRowid       1     6     0                    0   
13      ResultRow      6     1     0                    0   
14    Next           1     6     0                    0   
15    Halt           0     0     0                    0   
16    Transaction    0     0     2     0              1   
17    Integer        150   5     0                    0   
18    Goto           0     1     0                    0   

With 3.34.0 we get the following. Notice that the first Column instruction (addr 9) appears to access the table t (cursor 0), not the index t_i (cursor 1), which if I'm not mistaken indicates the covering index isn't being used to its potential, and at scale we'll incur a lot more page misses.

addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     20    0                    0   
1     OpenRead       0     2     0     2              0   
2     OpenRead       1     3     0     k(3,,,)        0   
3     Integer        100   1     0                    0   
4     SeekGE         1     19    1     1              0   
5     Integer        200   1     0                    0   
6       IdxGT          1     19    1     1              0   
7       DeferredSeek   1     0     0                    0   
8       IfNullRow      1     11    3                    0   
9       Column         0     0     3                    0   
10      Affinity       3     1     0     D              0   
11      IfNullRow      1     14    4                    0   
12      Column         1     1     4                    0   
13      Affinity       4     1     0     D              0   
14      Add            4     3     2                    0   
15      Lt             5     18    2                    80  
16      IdxRowid       1     6     0                    0   
17      ResultRow      6     1     0                    0   
18    Next           1     6     0                    0   
19    Halt           0     0     0                    0   
20    Transaction    0     0     2     0              1   
21    Integer        150   5     0                    0   
22    Goto           0     1     0                    0   

The regression seems to hinge on the indirection of the virtual columns; i.e. if we simplify the example without them, then the index covers as expected. The whole example is already simplified (thus the virtual columns appearing trivial/unnecessary) from the real application use case described here.

(2) By Larry Brasfield (larrybr) on 2022-12-21 02:09:16 in reply to 1 [link] [source]

Your identification of regression is confusing to me.

For your 1st query plan, you say: "With SQLite 3.39.3 or earlier, the plan is as desired"
For your 2nd query plan, you say: "With 3.34.0 we get the following. ... at scale we'll incur a lot more page misses."

To me, that language indicates "progression" rather than "regression". The plan you like better occurs with a later SQLite version. It happens to match the QP that the current tip of trunk produces also.

These observations lead me to suggest that your project use version 3.39.3 or later (and be happy!)

What am I missing here?

(3) By mlin (dnamlin) on 2022-12-21 02:12:26 in reply to 2 [link] [source]

Yikes sorry, I don't know what happened with my brain there; s/3.34.0/3.40.0/g to my original.

(4) By Larry Brasfield (larrybr) on 2022-12-21 02:21:12 in reply to 3 [link] [source]

When I do that substitution, I still get tip of trunk representing a progression, (albeit the reversal of a regression.) So I suggest you try a more recent version.

(5) By Keith Medcalf (kmedcalf) on 2022-12-21 02:23:46 in reply to 2 [link] [source]

I think you have that backwards, Larry. The current trunk produces the bad plan.

SQLite version 3.41.0 2022-12-17 23:13:12
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE t(
(x1...>     beg INTEGER NOT NULL, end INTEGER NOT NULL,
(x1...>     _beg INTEGER AS (beg) VIRTUAL, _len INTEGER AS (end-beg) VIRTUAL
(x1...> );
VM-steps: 37
Run Time: real 0.001 user 0.000000 sys 0.000000
sqlite> CREATE INDEX t_i ON t(_beg, _len);
VM-steps: 28
Run Time: real 0.001 user 0.000000 sys 0.000000
sqlite> EXPLAIN SELECT _rowid_ FROM t INDEXED BY t_i WHERE _beg BETWEEN 100 AND 200 AND _beg+_len >= 150;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     28    0                    0   Start at 28
1     OpenRead       0     4     0     2              2   root=4 iDb=0; t
2     ColumnsUsed    0     0     0     15             0
3     OpenRead       1     5     0     k(3,,,)        0   root=5 iDb=0; t_i
4     ColumnsUsed    1     0     0     3              0
5     Explain        5     0     0     SEARCH t USING INDEX t_i (_beg>? AND _beg<?) (~15360 rows)  0
6     Noop           0     0     0                    0   Begin WHERE-loop0: t
7     CursorHint     1     0     0     AND(expr,GE(ADD(c0,c1),150))  0
8     Integer        100   1     0                    0   r[1]=100
9     SeekGE         1     26    1     1              0   key=r[1]
10    Integer        200   1     0                    0   r[1]=200
11      IdxGT          1     26    1     1              0   key=r[1]
12      DeferredSeek   1     0     0                    0   Move 0 to 1.rowid if needed
13      IfNullRow      1     16    3                    0   if 1.nullRow then r[3]=NULL, goto 16
14      Column         0     0     3                    0   r[3]= cursor 0 column 0
15      Affinity       3     1     0     D              0   affinity(r[3])
16      IfNullRow      1     19    4                    0   if 1.nullRow then r[4]=NULL, goto 19
17      Column         1     1     4                    0   r[4]=t_i expr-column 1
18      Affinity       4     1     0     D              0   affinity(r[4])
19      Add            4     3     2                    0   r[2]=r[4]+r[3]
20      Lt             5     25    2                    80  if r[2]<r[5] goto 25
21      Noop           0     0     0                    0   Begin WHERE-core
22      IdxRowid       1     6     0                    0   r[6]=rowid; t.rowid
23      ResultRow      6     1     0                    0   output=r[6]
24      Noop           0     0     0                    0   End WHERE-core
25    Next           1     11    0                    0
26    Noop           0     0     0                    0   End WHERE-loop0: t
27    Halt           0     0     0                    0
28    Transaction    0     0     4     0              1   usesStmtJournal=0
29    Integer        150   5     0                    0   r[5]=150
30    Goto           0     1     0                    0
VM-steps: 0
Run Time: real 0.051 user 0.000000 sys 0.015625
sqlite>

That is

14      Column         0     0     3                    0   r[3]= cursor 0 column 0
should be
14      Column         1     0     3                    0   r[3]= cursor 1 column 0
if the covering index were being fully utilized. As it is, a seek into the main table is being done that is not required.

(6) By Larry Brasfield (larrybr) on 2022-12-21 02:34:22 in reply to 5 [link] [source]

Yarrghh, you're right. I was running something earlier but with my current directory as .../LibTrunk which, in my too-hasty thinking, equated to tip of trunk. (The directory is that, but my running shell was another version.)

(7) By mlin (dnamlin) on 2022-12-21 02:46:37 in reply to 6 [link] [source]

Thank you both for bearing with my confusion. I think I read 3.40.0 and embedded it as "three four" in my mind's latent space, which then stably diffused to 3.34.0 in the generative process for my original post.

(8) By Richard Hipp (drh) on 2022-12-21 03:29:09 in reply to 1 [link] [source]

(9) By Richard Hipp (drh) on 2022-12-21 13:03:53 in reply to 1 [link] [source]

Your simple work-around is to not define a virtual column to be just another column in the same table. Instead, add a unary "+" operator in front of the definition, so that the virtual column value is an expression rather than a pure column reference. So, instead of this:

_beg INTEGER AS (beg) VIRTUAL

Write this:

_beg INTEGER AS (+beg) VIRTUAL

(10) By Larry Brasfield (larrybr) on 2022-12-21 14:48:35 in reply to 1 [link] [source]

With a check-in this morning, Richard has put the advice of post #9 into effect on the trunk branch. With that change, the query plan is effectively (but not exactly) the same as the one you show for "SQLite 3.39.3 or earlier".

(11) By mlin (dnamlin) on 2022-12-21 19:54:01 in reply to 10 [link] [source]

Thank you Richard, Larry, & Keith; much appreciated!