Why sqlite sometimes prefers "SCAN TABLE"-index over a "SEARCH TABLE"-index?

(1.1) By Alexander Steinhöfer (stnshr) on 2021-03-25 07:19:34 edited from 1.0 [source]

Hi all,

I am a bit confused about why sqlite does not use the obvious index ("search table ...") and prefers another index leading to a "scan table using index", and wanted to ask if you could help me clear me confusion :)

In my case, I have this table:

CREATE TABLE IF NOT EXISTS trk_config_records(
   cfg_record_id INTEGER  PRIMARY KEY,
   niid          TEXT     NOT NULL,
   modulename    TEXT     NOT NULL,
   level         INTEGER  NOT NULL,
   forward       INTEGER  NOT NULL,
   tag           TEXT     NOT NULL,
   user          TEXT     NOT NULL,
   expires       INTEGER  NOT NULL
);

and the following indizes:

CREATE INDEX IF NOT EXISTS trk_cfg_idx_expires ON trk_config_records(
   expires
);

CREATE INDEX IF NOT EXISTS trk_cfg_idx_niid_mod ON trk_config_records(
   niid,
   modulename DESC
);

Apart from one insert-statement, the following two statements are used mostly:

DELETE FROM trk_config_records
WHERE expires > 0 AND expires <= strftime('%s','now');

EXPLAIN QUERY PLAN SELECT niid, modulename, level, forward
FROM trk_config_records
WHERE expires = 0 OR expires > strftime('%s','now')
ORDER BY niid, modulename DESC

EXPLAIN QUERY PLAN for the DELETE(1.)-statement is as expected:

QUERY PLAN
`--SEARCH TABLE trk_config_records USING INDEX trk_cfg_idx_expires (expires>? 
AND expires<?)

EXPLAIN QUERY PLAN for the SELECT(2.)-statement does, however not utiliy trk_cfg_idx_expires:

QUERY PLAN
`--SCAN TABLE trk_config_records USING INDEX trk_cfg_idx_niid_mod

To my knowledge SEARCH TABLE should always be preferable to SCAN TABLE ... USING INDEX, so I was wondering whether I did something wrong in my SELECT-statement, or if it's really more efficient for sqlite to not use trk_cfg_idx_expires at all for the WHERE-clause and insted only use trk_cfg_idx_niid_mod which I've originaly created as "ordering index" for the ORDER BY clause.

Thanks for your help!

-Alex

(2) By Richard Hipp (drh) on 2021-03-25 13:00:16 in reply to 1.1 [link] [source]

The best index to use here depends on the content of the table.

On one extreme, suppose there are one million entries in your table and all of them match the WHERE condition. (Suppose they all have expires=0.) In that case, using the expires index is going to be a linear scan of the index anyhow, and afterwards you are going to need to do an O(NlogN) sort of the results. But if you simply scan the niid-modulename index, you will get the same answer in only O(N), which is clearly faster. (This example shows why SEARCH TABLE is not preferred over SCAN TABLE USING INDEX in every case.)

On the other extreme, suppose only one or two rows out of the million-row table match the WHERE condition. In that case, it makes more sense to use the expires index to look up the two rows in O(logN) time. And a sort of just one or two rows adds hardly anything to the overall runtime.

What we see here is that the best index to use depends on the content of the table. The examples above show cases where it is better to use the expires index and where it is better to use the niid-modulename index. In between these two extremes will be a cross-over point where the two techniques run at about the same speed.

The task of the query planner it to try to discern which of these two query plans is the best. And it has to do so without knowledge of the internal data of the table. The query planner could query the table to find out more about how often the WHERE clause is true, but that would likely take far more time than just running the query. So the query planner has to guess. Sometimes it guesses wrong.

You, the developer, might have deeper knowledge of your data and might know that one plan is preferable to the other. In that case, you can provide hints to the query planner to cause it to prefer a specific query plan. Items 6 through 9 of the Checklist For Avoiding Or Fixing Query Planner Problems provide details.

In your case, the easiest way to get the query planner to use the expires index is to add a "+" sign in front of the "niid" term of the ORDER BY clause:

SELECT niid, modulename, level, forward
FROM trk_config_records
WHERE expires = 0 OR expires > strftime('%s','now')
ORDER BY +niid, modulename DESC;

(3) By Jim Morris (jimmorris33215) on 2021-03-25 15:46:44 in reply to 2 [link] [source]

On a related note, when an expires date value of zero is forever, I've found it simpler to store a very large date value in the database, using a constant in my code. This simplifies the query, making it easier to read at the small cost of converting to and from zero in code.

(4) By Alexander Steinhöfer (stnshr) on 2021-03-26 06:04:24 in reply to 2 [link] [source]

Aaaah! Thank your for that very detailed explanation!

That cleared many things up for me and helped my understanding as to how to approach the EXPLAIN QUERY PLAN-output. Thanks again!

(5) By Alexander Steinhöfer (stnshr) on 2021-03-26 06:10:54 in reply to 3 [link] [source]

Yes - I am still on the fence about that. On one hand it would get rid of the OR and make for an even easier reading of the query (also, a value that expires in, say, 10 years has -for all practical purposes- an infinite expires time in my context), on the other hand it's easier to explain to the users of our application that they can see all entries that don't expire by specifying 0 instead of an arcane/very large value.

Otherwise I could work with a switch like --no-expires (or a checkbox in GUI terms) and internally insert that arcane value.