An index is not used to optimize GROUP BY query

(1) By anonymous on 2022-01-09 19:37:37 [link] [source]

Considering simple table and a query

DROP TABLE IF EXISTS orders;
CREATE TABLE orders ( time INT, client TEXT, item TEXT );
CREATE INDEX wrong_index ON orders ( client, time );
CREATE INDEX good_index ON orders ( time, client );
EXPLAIN QUERY PLAN SELECT count(*) FROM orders WHERE time > 10 GROUP BY client;

outputs: --SCAN orders USING COVERING INDEX wrong_index

Why? Isn't it optimal to use good_index and speed up query dramatically? In my case a more complex table has many million records and it is extremely slow to query orders made on a particular day or an hour.

(2) By Igor Tandetnik (itandetnik) on 2022-01-09 20:49:55 in reply to 1 [source]

good_index is not helpful for this query. GROUP BY client part needs rows with the same client next to each other, but in good_index they aren't.

Try running ANALYZE on your actual database. It's possible that the optimizer will then be able to determine that time > 10 condition selects a small subset of records, and it's worth using the index for that part and then sort the resulting records for the GROUP BY part.

Or, you can make it GROUP BY +client. The unary plus is a no-op but will suppress the use of any index for the GROUP BY. Chances are then high that good_index will be selected to satisfy time > 10 condition.

(3) By anonymous on 2022-01-09 22:14:11 in reply to 2 [link] [source]

Hm, running ANALIZE changed the behavior from SCAN to SEARCH using the same wrong index but queries become much faster. I am now curious what magic info did it gather. But anyway thanks for the good hint!

(4) By Igor Tandetnik (itandetnik) on 2022-01-09 22:32:27 in reply to 3 [link] [source]

Probably discovered that client has low cardinality (few distinct values). So it could essentially run SELECT count(*) FROM orders WHERE client=:c AND time > 10 for each distinct value :c of order.client, and return a union of these.