Simple query leading to a surprisingly poor performing query plan, due to redundant work
(1) By Merijn (merijn) on 2020-04-24 15:41:33 [source]
In the process of optimising a big query I noticed an unexpected drop in performance in the final step and ordering. The query and schema I'm working with are rather big and messy, but fortunately I was able to reproduce the exact same issue in a rather trivial schema and query.
Let's consider the following schema:
CREATE TABLE "Test"("id" INTEGER PRIMARY KEY,"data" TEXT NOT NULL);
I'm doing a full table scan of
Test, joining several other tables to it, running a window function, and ordering the result. Stripped down of all the irrelevant joins, details, and replacing my custom window function we can consider it equivalent to the following query:
SELECT id, data, ROW_NUMBER() OVER (ORDER BY id) AS num FROM Test ORDER BY id;
Which produces the following query plan:
QUERY PLAN |--CO-ROUTINE 2 | `--SCAN TABLE Test `--SCAN SUBQUERY 2
Exactly what I'd like to see and indeed, my own complicated version of this query starts spitting out result rows immediately.
Now, what I really want to do is filter my result based on the output of my window function. Now, we can't use window functions in
WHERE, so I figured I would simply use the above as a subquery and do the filtering in the outer query, like so (I've tried removing ORDER BY from the inner query, but that has no effect):
SELECT id, data, num FROM ( SELECT id, data, ROW_NUMBER() OVER (ORDER BY id) AS num FROM Test ORDER BY id ) WHERE num % 2 = 0 ORDER BY id;
This produces the following query plan:
QUERY PLAN |--CO-ROUTINE 1 | |--CO-ROUTINE 3 | | `--SCAN TABLE Test | `--SCAN SUBQUERY 3 |--SCAN SUBQUERY 1 `--USE TEMP B-TREE FOR ORDER BY
As you can, it now uses a temp B-tree to order the query results, which is redundant as the inner query it's scanning already has the results in the exact same order. The effect this has in my code is rather painful. The full result set is some 100k rows and rather than immediately starting to return result rows as the first example, it now takes 15-30 seconds before the first result row is returned. And this is entirely due to the
ORDER BY on the outer query, if I remove that, the temp B-tree disappears and it returns result immediately.
It seems like the query planner should be able to tell the ordering of the inner and outer queries are identical and that it can thus skip the temp B-tree, but I've not managed to find a way to convince it of that. Now, it seems that SQLite preserves the ordering of the inner query when I drop the
ORDER BY, but this isn't guaranteed and I dislike relying on such implementation details.
So I guess I have two questions:
- Is there any workaround I can use to avoid the slowdown caused by the temp B-tree, and
- is there any chance the query planner can/will be fixed to avoid introducing these redundant B-trees?