Query planner fails to use obvious index

(1) By EricG (EriccG) on 2020-06-25 19:51:42 [link]

Hi,

I have found a case in SQLite 3.32.2 where the query planner does not seem to use an obvious index when a comparison operator is used, minimal case below

    create table test (id integer primary key, key_id integer);
    create index test_key_idx on test(key_id);

with that schema

    explain query plan
    select min(id) from test where key_id >= 1

returns "SEARCH TABLE test", which does not use the index (and is very slow in my real world case), while the following "workaround" query 

    explain query plan
    select min(id) from test where key_id between 1 and 999999

returns "SEARCH TABLE test USING COVERING INDEX test_key_idx (key_id>? AND key_id<?)", which uses the index and is fast.

And even the brute force workaround query is fast

    select min(id) from test where key_id between 1 and (select max(key_id) from test)

as it uses the index in both the main and the sub-select.

Thanks

(2) By luuk on 2020-06-27 10:12:24 in reply to 1 [link]

> explain query plan

> select min(id) from test where key_id >= 1

> returns "SEARCH TABLE test", which does not use the index (and is very slow in my real world case), while the following "workaround" query

Because 99.9% of the table has to be read, to know the minimal value of `id`, the choice is made to read the complete table.

Nothing wrong with that.

(3) By Kees Nuyt (knu) on 2020-06-27 13:56:44 in reply to 1 [link]

Did you run ANALYZE after populating the table and before running the test?

(4) By Keith Medcalf (kmedcalf) on 2020-06-27 14:51:05 in reply to 3 [link]

This does make a difference.  At least in the case where STAT4 statistics are being collected.

```
sqlite> create table test (id integer primary key, key_id integer);
sqlite> create index test_key_idx on test(key_id);
sqlite> select min(id) from test where key_id >= 1;
QUERY PLAN
`--SEARCH TABLE test (~983040 rows)
┌─────────┐
│ min(id) │
├─────────┤
│         │
└─────────┘
sqlite> insert into test (key_id) select randomv(100)-98 from wholenumber where value between 1 and 100000;
QUERY PLAN
`--SCAN TABLE wholenumber VIRTUAL TABLE INDEX 10: (~24 rows)
sqlite> select min(id) from test where key_id >= 1;
QUERY PLAN
`--SEARCH TABLE test (~896 rows)
┌─────────┐
│ min(id) │
├─────────┤
│ 132     │
└─────────┘
sqlite> analyze;
sqlite> select min(id) from test where key_id >= 1;
QUERY PLAN
`--SEARCH TABLE test USING COVERING INDEX test_key_idx (key_id>?) (~1024 rows)
┌─────────┐
│ min(id) │
├─────────┤
│ 132     │
└─────────┘
```

You will note that no matter what you do, you will have to scan either the whole table or the whole index.  Without histogram data there is no way to determine whether it is more efficient to search the id's in order looking for a key_id greater or equal to 1, or whether to search the entire index starting from the first key_id >= 1 looking for the minimum id.

If you place a constraint which makes one or the other the obvious default solution, then that is the method used (ie, the constraint BETWEEN on key_id) places a constraint on the number of rows scanned, so therefore it is chosen as being the quickest method of culling candidates because it will (in the worst case) *always* result in less rows being scanned than the worst case for a full table scan.

(5) By EricG (EriccG) on 2020-06-29 05:51:26 in reply to 2 [link]

I disagree, because the BETWEEN variants of the query do use the index, including the last "brute force" one which uses the index to find the maximum value of id (and you can try, SQLlite can use the index to find the minimum value of the index)

(6) By EricG (EriccG) on 2020-06-29 06:32:08 in reply to 4 [link]

Would not scanning the whole index always be beneficial in that particular case ?

My reasonning is that scanning the index is going to involve at worst the same number of rows, and at best a much lower number of rows.

And even when you are scanning the same number of rows, scanning the index would involve less blocks, as the index is going to be smaller than the table (in terms of storage size) in most circumstances.

FWIW I first experimented with that query as a way to reduce block I/O, the table (which I simplified here) has a few other fields, making it about 10x larger in bytes than the index (measured by comparing file sizes after vacuum, once before and once after dropping the index)

(7) By Ryan Smith (cuz) on 2020-06-29 14:54:56 in reply to 6 [link]

>Would not scanning the whole index always be beneficial in that particular case ?

That is not quite correct.

I haven't tested the example at hand in this thread, but just adding a note on the idea that scanning an index is more useful/speedy than scanning a table.

It is not.

A table is (at least in some storage engines, like SQLite) itself simply a covering Index with its INDEXED key being its rowid (unless it's a WITHOUT ROWID table, then the PK is the indexed key).

Scanning the Table *IS* in fact scanning the best possible Index. Any other Index used as a lookup has the immediate deficit that once you find a value in there, it is found with a reference to the main table key (rowid etc.) which in turn requires *another* indexed lookup to find the value row in the main table.

The only saving grace for using another Index, is if using the other index saves so many cycles (thanks to the BTREE look-ups) that it overshadows the cycles wasted on the re-lookup into the main table Index. The Query planner has to cleverly try and calculate and weigh these costs against each other and try to make informed but applicable-in-the-general-sense guesses on when to use the Index and when to go straight for the table lookup. On occasion, a particular combination of schema+data forms a structure that does not play to the Query-planner's guesses and is in fact faster if another method is used.[1]

The best general-case has to be served, but there are tools to help. The ANALYZE method for instance builds a table that looks at the actual shape of the data and better informs the query planner (which is why someone asked if you ran Analyze). There are other tools like query planner hints or forcing the outer loops, which all falls outside of this alreadt-too-long post - so let me end by just saying, no, it is not better to scan the index rather than the table. That is only true if the Index significantly reduces lookup cost, and unless that difference is made obvious to the query planner, it rightfully won't choose the index.

[1] Of course if the Index is itself a covering index and all the referenced fields are contained within the cover, then it becomes a more useful scan.

Cheers,
Ryan

(8) By Larry Brasfield (LarryBrasfield) on 2020-06-29 15:41:07 in reply to 7 [link]

Regarding,

> A table is (at least in some storage engines, like SQLite) itself simply a covering Index with its INDEXED key being its rowid ...

Well, with all due respect, that is not quite correct either. And the cases where it is not true show when using an index could be faster than a b-tree guided table scan.

In fact, as documented for the [SQLite3 file format](https://sqlite.org/fileformat2.html), at section "1.6. B-tree Pages", so-called "payload" (data other than the rowid) can be stored on the same pages that contain the rowid b-tree structure. (See 11th paragraph, among others.) So such pages **may** be more than simply a rowid covering index. And for that reason, when the only information to be retrieved is a rowid that can be discerned from the b-tree alone, a b-tree only index traversal *could* require fewer pages to be read from disk (or other slower forms of memory) than a traversal that also requires useless payload to be pulled into main memory.

I acknowledge that this only differs from nit-picking by a smidgen. But since we're getting all technical here, I figured the correction is apropos.

It should also be noted that the use case addressed by this and related posts fits the definition of "edge case" to a T. The "missed" optimization opportunity is likely to be no such thing when we consider that optimization takes time too.

(9) By Rowan Worth (sqweek) on 2020-06-30 02:35:54 in reply to 6 [link]

> Would not scanning the whole index always be beneficial in that particular case ?
> 
> My reasonning is that scanning the index is going to involve at worst the same number of rows, and at best a much lower number of rows.

I was thinking the same thing, but after reading Keith's post closer I think his point is that the best case for the _table_ scan makes it a very attractive option.

Just to remind myself of the query: select min(id) from test where key_id >= 1;


If you decide use the index to look for key_id >= 1 and then find the minimum id amongst those rows, you are obliged to search the whole index to satisfy the query (well, you can skip the key_id < 1 portion).

OTOH, if you start scanning the table in order of the id column and check for a key_id >= 1, you can abort the scan as soon as you find a matching row because you've found the correct answer. So at best, the table scan actually needs to read less rows.

That tradeoff gets less desirable as the table grows, and obviously in your case the query planner hasn't made a great choice (or you wouldn't be here). So there may be potential here for some heuristic improvements, or it may be that sqlite is able to make a better decision if it has more information about the data (via ANALYZE as Keith suggested).

(10) By EricG (EriccG) on 2020-06-30 05:10:48 in reply to 9

I understand better the query planner choice now.

In my case ANALYZE or not does not have an effect, I suspect this is because key_id is a relation to another table with a number of row of the same magnitude (though not 1:1), and both tables (and their ids) grow with time.

A simple histogram would tell a "flat" story, while in practice there is a strong correlation between id and key_id (older entries have lower numbers, recent entries have the higher numbers), and my key_id comparison is essentially a filter on the tables' tails.

The worst case happens in my case if the key_id criterion selects just the last row, the query planner currently does a full table scan with "key_id >" while a "key_id BETWEEN" scans just one row.

In my case I can live with a BETWEEN, and setting the higher bound to the Max(Int64) is going to be safe.

It is indeed a less obvious choice than I first thought. Thanks all!

(11) By Ryan Smith (cuz) on 2020-06-30 16:32:44 in reply to 8 [link]

> Well, with all due respect, that is not quite correct either. And the cases where it is not true show when using an index could be faster than a b-tree guided table scan.

With even more due respect, there is no case in which scanning a table's information is done faster by scanning any index which references the table.

Your point is good though, but I think perhaps my point was misunderstood - I'm not talking about look-ups, they will mostly be faster[1], I'm specifically (and only) referring to table-scans. i.e. a process in which EVERY row of a table is visited and read.

A simple armchair experiment will show that between these tow algorithms:

```
A:
for i = 0 to lastitem in Index A
read r from A
look-up r in Table T
read row r from T
do_whatever;

B:
for r = 0 to lastrow in Table T
read row r from T
do_whatever;
```

That B MUST in principle always be faster.

If there exists a method or a sort of linking between indexes and tables that makes A faster in some way, then I'm a monkey's uncle, and that method should ALWAYS be used to store table-indexes in stead of whatever silly slow method now underlies the row-id.

[1] Further to the original point, even if you do use look-ups -
If you draw two graphs depicting number of rows hit in a query, one for table-scan and one for number-of-Indexed-look-ups, both vs. cpu cycles, then:

- 1: The table-scan should be a near-perfectly flat line (because a scan always takes the same cycles to complete, unless we have a LIMIT clause or some way to know that we are done at any point), and
- 2: The Indexed-lookup should start very low for a single look-up and then steadily grow faster until it crosses the line of the table-scan graph, but it MUST cross that line BEFORE number-of-row-hits = number-of-rows-in-table. There is no situation in which it can finish underneath that line.

Hoping that clarifies my point some!

(12.2) By Larry Brasfield (LarryBrasfield) on 2020-06-30 20:42:53 edited from 12.1 in reply to 11 [link]

Regarding the assertion,

> there is no case in which scanning a table's information is done faster by scanning any index which references the table.

Put that way, probably not. But if we substitute "a table's information" with "that subset of the table's information which is necessarily contained in an index" (and "scanning any index" with "scanning an index of the table on that same subset"), which is close to the OP's posed issue, then in some cases (but not the OP's case [a]) scanning the index is all that is needed and it **could** be faster in some cases. To my point, those cases would be where the indexed field subset is not identical with the table's fields. IOW, where the table contains *more* information by volume (or byte count) than the index does. In that case, (but not the OP's where the index and table contain the same information), scanning the index will produce the results queried without having to pull into memory the additional data that is in the table b-tree pages but not the index b-tree pages.

[a. The OP's table contained only a single field beyond the rowid, and that field was indexed; there was no additional data in the table beyond what would have to be in the index. ]

This point is illustrated by an armchair experiment differing from yours:

```
A: // Index scan
for i = 0 to lastitem in Index A
pull in new b-tree page of Index A if necessary
read r from A
do not look-up r in Table T because it was taken from the Index A.
do not read row r from T because the required data is in Index A.
do_whatever;
B: // Table scan
for r = 0 to lastrow in Table T
pull in new b-tree page of Table T if necessary
read row r from T
do_whatever;
```

Comparing the actual work in A and B, it looks identical, but it is not when the pages of Table T are more numerous than the pages of Index A due to having more data in them, quite possibly a lot more.

Where we differ, I surmise, is in perception of need for "read row r from Table T" once r has been traversed to in the Index A b-tree. I am pretty sure that the b-tree must contain, in addition to its bare tree structure, the very data upon which the b-tree is ordered and sometimes balanced. Hence, when no data beyond what was indexed upon is needed for a query, what is needed is in the index and only the index need be read.

If I am wrong about the b-tree containing the indexed-upon data, I am willing to concede that my earlier point and above argument are mere fantasy. However, in the [Database File Format](https://sqlite.org/fileformat2.html) doc can be found this sentence confirming my notion: <code>For an index b-tree, the key is always arbitrary in length and hence the payload is the key.</code> Perhaps that payload is in such a form that the data which went into it can no longer be teased apart, and the indexed table must be consulted to retrieve it. If so, depending on how much data must be retrieved versus how much uselessly pulled into memory by using only the index to locate it, the performance contest could go either way. However, I expect that for single-field indexes, scanning on that same field is possible by reading just the index and quite possibly faster.

FWIW, I will be happy to be educated on this where needed.

(13) By Ryan Smith (cuz) on 2020-06-30 21:36:29 in reply to 12.2 [link]

>  But if we substitute "a table's information" with "that subset of the table's information which is necessarily contained in an index"...

Oh 100% agreed, as I think (or had hoped) my original message conveyed. If the looked-up Index itself already contains all the fields required in the look-up, filter, data required for the query, etc. then there is no need to read the table at all, and in that case the index always wins. (The very reason why covering indexes are useful).

I'll admit that my post was not so much regarding the OP's schema, I think I mentioned it at the top, but more about the general misconception that the Index is always better to use. It isn't and there's a reason the QP favours a table-scan over an index sometimes, which may help explain on occasion seeing a table-scan proposed in a query plan when one intuitively feels it shouldn't.

This may well not apply to the OP's case specifically.


PS: I still feel so dirty saying "Indexes" as opposed to "Indices"... but it's the way of the sea now.

(14.1) By Larry Brasfield (LarryBrasfield) on 2020-07-01 02:22:03 edited from 14.0 in reply to 13 [link]

> Oh 100% agreed, as I think (or had hoped) my original message conveyed.

And I thought we were arguing a disagreement! If not for my acquired distaste for arguments about arguments, I might try to discern who was most confused.

> not so much regarding the OP's schema ... more about the general misconception that the Index is always better to use

Amusingly, (at least to me), the OP's schema as posted contains two b-trees: (1) a table b-tree keyed by rowid with a "key_id" field tagging along as data; and (2) an index b-tree keyed by the key_id values, with a rowid tagging along in case the index becomes useful for quickly reaching a table row. Those b-trees contain the exact same data, and choosing between them for a query involving both fields, where they both are subject to test that does not strictly match the b-tree ordering, was always going to be a close call. Moreover, being close, it's a call I would be sorry to see over-optimized in a system where 'Lite' is a virtue.

**Warning: Off-topic matter follows.**

> saying "Indexes" as opposed to "Indices"

When I had some responsibility for raising children, I always hesitated to "correct" their application of usual language rules to cases where, for reasons better lost in history, exceptions are made by more learned folks. Language changes, and if it becomes more regular in the process, I am glad to see that.

(15) By Richard Hipp (drh) on 2020-07-01 15:00:20 in reply to 1 [link]

SQLite is gives consideration to two algorithms for this query:

  1.  Scan the "test" table (which is already in "id" order) looking
      for the first entry for which key_id>=1.  The scan stops as soon
      as any entry with key_id>=1 is seen because it knows that entry will have
      the smallest possible "id" value.

  2.  Do a search for the first entry in the test_key_idx table with key_id>=1
      and then scan that entry and all subsequent entries, looking for the
      smallest "id" value.  All index entries with key_id>=1 must be scanned
      in this case since we never know which one will have the smallest value
      for "id".

The choice of algorithm is not obvious here.  Algorithm 1 will work best
if most entries have key_id>=1 and algorithm 2 will work best if most entries
have key_id<1.  The query planner does not normally know what the distribution
of key_id values is, so it does not normally have a good way to decide which
algorithm is best.

The "normally" in the previous sentence means "unless you compile with
SQLITE_ENABLE_STAT4 and run ANALYZE". If you do compile with SQLITE_ENABLE_STAT4
and run ANALYZE prior to your query, then the query planner does have some
notion of the distribution of key_id values, and it does appear to choose
the better algorithm depending on that distribution.

(16) By EricG (EriccG) on 2020-07-03 05:44:19 in reply to 15 [link]

Thanks for the explanation, this is what I had gathered from previous replies.