An index is not used to optimize GROUP BY query
(1) By anonymous on 2022-01-09 19:37:37 [link] [source]
Considering simple table and a query
DROP TABLE IF EXISTS orders; CREATE TABLE orders ( time INT, client TEXT, item TEXT ); CREATE INDEX wrong_index ON orders ( client, time ); CREATE INDEX good_index ON orders ( time, client ); EXPLAIN QUERY PLAN SELECT count(*) FROM orders WHERE time > 10 GROUP BY client;
outputs: --SCAN orders USING COVERING INDEX wrong_index
Why? Isn't it optimal to use good_index and speed up query dramatically? In my case a more complex table has many million records and it is extremely slow to query orders made on a particular day or an hour.
(2) By Igor Tandetnik (itandetnik) on 2022-01-09 20:49:55 in reply to 1 [source]
good_index is not helpful for this query.
GROUP BY client part needs rows with the same
client next to each other, but in
good_index they aren't.
ANALYZE on your actual database. It's possible that the optimizer will then be able to determine that
time > 10 condition selects a small subset of records, and it's worth using the index for that part and then sort the resulting records for the
GROUP BY part.
Or, you can make it
GROUP BY +client. The unary plus is a no-op but will suppress the use of any index for the GROUP BY. Chances are then high that
good_index will be selected to satisfy
time > 10 condition.
(3) By anonymous on 2022-01-09 22:14:11 in reply to 2 [link] [source]
Hm, running ANALIZE changed the behavior from SCAN to SEARCH using the same wrong index but queries become much faster. I am now curious what magic info did it gather. But anyway thanks for the good hint!
(4) By Igor Tandetnik (itandetnik) on 2022-01-09 22:32:27 in reply to 3 [link] [source]
Probably discovered that
client has low cardinality (few distinct values). So it could essentially run
SELECT count(*) FROM orders WHERE client=:c AND time > 10 for each distinct value
order.client, and return a union of these.