Why sqlite sometimes prefers "SCAN TABLE"-index over a "SEARCH TABLE"-index?
(1.1) By Alexander Steinhöfer (stnshr) on 2021-03-25 07:19:34 edited from 1.0 [source]
I am a bit confused about why sqlite does not use the obvious index ("search table ...") and prefers another index leading to a "scan table using index", and wanted to ask if you could help me clear me confusion :)
In my case, I have this table:
CREATE TABLE IF NOT EXISTS trk_config_records( cfg_record_id INTEGER PRIMARY KEY, niid TEXT NOT NULL, modulename TEXT NOT NULL, level INTEGER NOT NULL, forward INTEGER NOT NULL, tag TEXT NOT NULL, user TEXT NOT NULL, expires INTEGER NOT NULL );
and the following indizes:
CREATE INDEX IF NOT EXISTS trk_cfg_idx_expires ON trk_config_records( expires ); CREATE INDEX IF NOT EXISTS trk_cfg_idx_niid_mod ON trk_config_records( niid, modulename DESC );
Apart from one insert-statement, the following two statements are used mostly:
DELETE FROM trk_config_records WHERE expires > 0 AND expires <= strftime('%s','now');
EXPLAIN QUERY PLAN SELECT niid, modulename, level, forward FROM trk_config_records WHERE expires = 0 OR expires > strftime('%s','now') ORDER BY niid, modulename DESC
EXPLAIN QUERY PLAN for the
DELETE(1.)-statement is as expected:
QUERY PLAN `--SEARCH TABLE trk_config_records USING INDEX trk_cfg_idx_expires (expires>? AND expires<?)
EXPLAIN QUERY PLAN for the
SELECT(2.)-statement does, however not utiliy
QUERY PLAN `--SCAN TABLE trk_config_records USING INDEX trk_cfg_idx_niid_mod
To my knowledge
SEARCH TABLE should always be preferable to
SCAN TABLE ... USING INDEX, so I was wondering whether I did something wrong in my
SELECT-statement, or if it's really more efficient for sqlite to not use
trk_cfg_idx_expires at all for the
WHERE-clause and insted only use
trk_cfg_idx_niid_mod which I've originaly created as "ordering index" for the
ORDER BY clause.
Thanks for your help!
(2) By Richard Hipp (drh) on 2021-03-25 13:00:16 in reply to 1.1 [link] [source]
The best index to use here depends on the content of the table.
On one extreme, suppose there are one million entries in your table and all of them match the WHERE condition. (Suppose they all have expires=0.) In that case, using the expires index is going to be a linear scan of the index anyhow, and afterwards you are going to need to do an O(NlogN) sort of the results. But if you simply scan the niid-modulename index, you will get the same answer in only O(N), which is clearly faster. (This example shows why SEARCH TABLE is not preferred over SCAN TABLE USING INDEX in every case.)
On the other extreme, suppose only one or two rows out of the million-row table match the WHERE condition. In that case, it makes more sense to use the expires index to look up the two rows in O(logN) time. And a sort of just one or two rows adds hardly anything to the overall runtime.
What we see here is that the best index to use depends on the content of the table. The examples above show cases where it is better to use the expires index and where it is better to use the niid-modulename index. In between these two extremes will be a cross-over point where the two techniques run at about the same speed.
The task of the query planner it to try to discern which of these two query plans is the best. And it has to do so without knowledge of the internal data of the table. The query planner could query the table to find out more about how often the WHERE clause is true, but that would likely take far more time than just running the query. So the query planner has to guess. Sometimes it guesses wrong.
You, the developer, might have deeper knowledge of your data and might know that one plan is preferable to the other. In that case, you can provide hints to the query planner to cause it to prefer a specific query plan. Items 6 through 9 of the Checklist For Avoiding Or Fixing Query Planner Problems provide details.
In your case, the easiest way to get the query planner to use the expires index is to add a "+" sign in front of the "niid" term of the ORDER BY clause:
SELECT niid, modulename, level, forward FROM trk_config_records WHERE expires = 0 OR expires > strftime('%s','now') ORDER BY +niid, modulename DESC;
(3) By Jim Morris (jimmorris33215) on 2021-03-25 15:46:44 in reply to 2 [link] [source]
On a related note, when an expires date value of zero is forever, I've found it simpler to store a very large date value in the database, using a constant in my code. This simplifies the query, making it easier to read at the small cost of converting to and from zero in code.
(4) By Alexander Steinhöfer (stnshr) on 2021-03-26 06:04:24 in reply to 2 [link] [source]
Aaaah! Thank your for that very detailed explanation!
That cleared many things up for me and helped my understanding as to how to approach the
EXPLAIN QUERY PLAN-output. Thanks again!
(5) By Alexander Steinhöfer (stnshr) on 2021-03-26 06:10:54 in reply to 3 [link] [source]
Yes - I am still on the fence about that. On one hand it would get rid of the
OR and make for an even easier reading of the query (also, a value that expires in, say, 10 years has -for all practical purposes- an infinite expires time in my context), on the other hand it's easier to explain to the users of our application that they can see all entries that don't expire by specifying
0 instead of an arcane/very large value.
Otherwise I could work with a switch like
--no-expires (or a checkbox in GUI terms) and internally insert that arcane value.