SQLite Forum

Tracing why SQLITE is making an index decision
Login

Tracing why SQLITE is making an index decision

(1) By Deon Brewis (deonb) on 2021-02-26 03:15:09 [link] [source]

I have two almost identical machines each running almost exactly the same SQLITE3 database. Each of the machines have the same 2 indexes. On one machine it's choosing one index, on the other it's choosing the other index to satisfy the same query. And it eludes me as to why.

The query is a simple:

select Id from Table where DateStart > xxx and DateEnd < yyy and Flag = 1

The index that I want has a 'WHERE' clause:

DateStart, DateEnd WHERE Flag = 1
The other index (that's chosen on one device but not the other) just has DateStart and a bunch of unrelated columns.

Both of the database have had ANALYZE run on them. The sqlite_stat1 entries for the two indexes are identical.

Is there perhaps a way I can list the set of index choices SQLITE has for a table and see why it's preferring the one index over the other? (Even in a debugger with a breakpoint will help).

(2.1) By Warren Young (wyoung) on 2021-02-26 05:37:15 edited from 2.0 in reply to 1 [link] [source]

  1. You're probably using different versions of SQLite on these two machines. A given SQLite version should be deterministic in its behavior.

  2. In a recent version of the shell, give the ".eqp on" command and issue your query. Post the results.

  3. Ditto, but give the ".expert" command first. It will give a simplified version of the prior command's output, more aimed at the end-user finding their own solutions than having others give them answers.

(3.1) By Deon Brewis (deonb) on 2021-02-26 21:53:46 edited from 3.0 in reply to 2.1 [link] [source]

  1. It's the same version of SQLite. I can put the same database file on the same device as well and have the same problem via the command line.

2 and 3) It doesn't really seem to show anything additional useful above explain query plan (it doesn't show why it's NOT considering an index):

On file A:

SEARCH TABLE Media AS OBJ USING INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 (DateRangeStart<?)

On file B:

SEARCH TABLE Media AS OBJ USING INDEX dynidx_25654ad4d39c3235f09739e35d5e81768e2c3199 (DateRangeStart<?)

Both of those exist in both files:

CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeStart,DateRangeEnd) where SearchPartitions=1
CREATE INDEX dynidx_25654ad4d39c3235f09739e35d5e81768e2c3199 on Media(DateRangeStart desc,NormalizedFileNameNoExt desc,UniqueHash desc)

And the stats are the same in both files:

Media | dynidx_25654ad4d39c3235f09739e35d5e81768e2c3199 | 590433 2 2 1
Media | dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 | 520411 2 2

(4) By Simon Slavin (slavin) on 2021-02-28 11:25:53 in reply to 1 [link] [source]

Having seen your response to other questions …

Using the sqlite shell tool, can you look at the result of

.schema Table

I particularly want to know whether the columns are declared with affinities (types). Also please

Can you confirm that in the command you type into the shell tool

select Id from Table where DateStart > xxx and DateEnd < yyy and Flag = 1

both 'xxx' and 'yyy' are integers entered without quotes ? In other words, just a series of digits ? Can you give an example ? In your code are you binding integers to '?' ?

Please also

Create a new index on just (DateStart,DateEnd) without. the Flag. Create a new index on just (DateStart). Then run your .eqp (or use EXPLAIN QUERY PLAN) again and see which index SQLite uses.

(You can delete the new indexes when you're done.)

(5) By Deon Brewis (deonb) on 2021-03-01 06:32:30 in reply to 4 [link] [source]

The actual expanded query as reported by SQLITE TRACE:

SELECT COUNT(*) FROM (SELECT OBJ.Id AS Id FROM Media AS OBJ WHERE SearchPartitions=1 AND OBJ.DateRangeStart <= 1396310399998 AND OBJ.DateRangeEnd >= 1393632000000)

.schema Table is a huge block of text, but I think this is what you're looking for:

   DateRangeStart int,
   DateRangeEnd int,
   SearchPartitions int,

However, this isn't really about troubleshoot just this issue, but to learn in general how to figure out the SQLITE index choices. I've fixed this specific issue by changing the index from:

CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeStart,DateRangeEnd) where SearchPartitions=1
to:
CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeEnd,DateRangeStart) where SearchPartitions=1

PS: also tried a:

CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeStart DESC,DateRangeEnd) where SearchPartitions=1
but that didn't work.

Seems like SQLITE prefers doing a DateRangeEnd >= over doing a DateRangeStart <=, whether DateRangeStart is ASC or DESC.

However, that's not really that important, I can generally coerce SQLITE into doing what I need by just randomly swapping stuff like this around. However, I'd like to learn more about how it makes index decisions so I can better figure out how to troubleshoot it myself.

(6.1) By Simon Slavin (slavin) on 2021-03-01 12:23:06 edited from 6.0 in reply to 5 [source]

I've fixed this specific issue by changing the index from

CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeStart,DateRangeEnd) where SearchPartitions=1

to

CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeEnd,DateRangeStart) where SearchPartitions=1

Bingo. Here's what's happening. The ANALYZE command lets SQLite see how 'chunky' each column is. Suppose you are trying to search a 10000 row table on both DateRangeStart and DateRangeEnd. And there are 4 different values for DateRangeStart but 100 different values for DateRangeEnd. The chunks in DateRangeStart are bigger, and there are fewer of them.

If you search using an index on DateRangeStart it's going to narrow down your search to 2500 rows. But if you search using an index on DateRangeEnd you'll get just 100 rows, which will get you closer to the rows you want, and do it faster.

So SQLite is weighing up the advantages of searching an existing index which doesn't help much, against just iterating through the whole table, which gives it all the values it needs (which in this case you don't care about because you're just counting them, but SQLite doesn't know that).

So what you asked for … a general understanding of the problem rather than a solution to just this one example … is that the order of terms in an index matters.

(7.1) By Deon Brewis (deonb) on 2021-03-01 17:01:35 edited from 7.0 in reply to 6.1 [link] [source]

In this case, all the DateRangeStart and DateRangeEnd values are identical so there wouldn't be any actual difference in chunking.

The more interesting question is why would 2 database files that have identical schema, similar table sizes, and identical sqlite_stat1 entries for the two indexes behave differently?

What other information (apart from schema and sqlite_stat1) is SQLite looking at when it makes an index decision based on ANALYZE? I don't mind the behavior - I mind not being able to predict the behavior.

PS: <= DateRangeStart and >= DateRangeEnd are both asking to search an index to a specific point and counting everything after that until it hits the other value. It should behave similarly if DateRangeStart is DESC, but SQLITE clearly has a preference for the >= and ASC over <= and DESC.