SQLite Forum

Min/max and index usage?
Login

Min/max and index usage?

(1) By Glenn W (gwaldron) on 2020-06-04 16:32:17 [link]

Hello,

I have an index:

`CREATE UNIQUE INDEX tile_index ON tiles (zoom_level, tile_column, tile_row)`

And the following query:

`SELECT min(zoom_level), max(zoom_level) from tiles`

The query is pretty slow for large databases and I suspect it's not using the index. I looked at the "min/max optimization" but not quite sure if it applies. Any advice appreciated.

Glenn

(2) By Richard Hipp (drh) on 2020-06-04 16:45:58 in reply to 1

The following query will likely be much faster:

~~~~~
    SELECT (SELECT min(zoom_level) FROM tiles), 
           (SELECT max(zoom_level) FROM tiles);
~~~~~

The current SQLite query planner is not clever enough to convert your
original query into the fast equivalent.

(3) By Glenn W (gwaldron) on 2020-06-04 18:20:50 in reply to 2 [link]

That helps! Thank you Richard.

Glenn

(4) By zhangyushao (zhangysh1995) on 2020-06-08 13:14:37 in reply to 2 [link]

Curious, why this one is faster than the original one?

It seems the only difference is that it splits the aggregation functions into two queries.

(5) By Ryan Smith (cuz) on 2020-06-08 13:30:49 in reply to 4 [link]

> It seems the only difference is that it splits the aggregation functions into two queries.

It's an important difference, because the query planner has optimizations for finding the MIN of an Indexed column (same way you can find the first entry in a phone-book very quick) and same for the MAX of an Indexed column, but it has currently no built-in short-cut case for getting both MIN and MAX in one go... so it needs to scan the entire Index to get the lowest and highest in such a single query.

When the query is split into the two simpler lookups, it can happily apply the cheat-sheet short-cut ways it knows for doing those special cases.

(6) By Igor Tandetnik (itandetnik) on 2020-06-08 13:31:56 in reply to 4 [link]

Separate queries for `min` and `max` both use the index, just picking its first and last entry. The combined query doesn't use the index and instead performs a full scan, because the planner is not smart enough to use the same index twice for the same table, in different directions.

(7) By Ryan Smith (cuz) on 2020-06-08 13:52:13 in reply to 4 [link]

I should perhaps add, in case anyone looks at this and thinks "Well why don't the query planner just add the MIN+MAX case then?"

It's a small bit of extra code that must also be maintained, plus a small bit of extra bloat in a very lean query planner - both of these would be quite acceptable if the gain is significant, or at least useful.

As it stands, if you add all the aggregate queries together which use some combination of MIN() and MAX(), you will find that[1]:

- MIN() alone is queried about 20% of the time,
- MAX() alone about 79% of the time, and
- MIN() + MAX() less than 1% of the time.

Add to that the simple optimization work-around of doing 2 queries, and the "need" diminishes rapidly.



[1] Yes, these statistics were completely made up on the spot, but I feel it's fairly accurate in the broad sense, though I would love to be corrected with actual stats, or simply an enlightened argument.