Odd, query taking too long

(1) By bobby31 on 2020-07-26 02:17:12 [link] [source]

Here is a weird thing,

the following query takes too long

SELECT P.name FROM People P 
WHERE P.nick = P.name AND 
 (SELECT flow FROM Traffic T WHERE T.name = P.name AND T.checkDate="2020-06-25") >= 500

while if I remove the "checkDate" constraint, it gives me 1300 records instantly.

Is the query not being optimized in the first case ?

BTW I have indexes on all the mentioned fields !

(2) By luuk on 2020-07-26 08:16:22 in reply to 1 [source]

Show all indexes you have, and the EXPLAIN QUERY PLAN.

But it is probably just a missing index on: "name, checkDate" or "checkDate,name" on the table Traffic.

(4) By bobby31 on 2020-07-30 10:47:54 in reply to 2 [link] [source]

Thank you very much. You put me on the right path.

(3) By Keith Medcalf (kmedcalf) on 2020-07-26 16:43:28 in reply to 1 [link] [source]

Define "too long".

In any case, for optimal performance you would need to have the appropriate indexes on Traffic (checkDate, name, flow) and also on People (name, nick), rewrite the query as a join, and then run analyze to give the Query Planner sufficient information to be able to optimize your query.

You will also have to decide on whether those indexes should be UNIQUE or not, and how you want to handle multiple matching rows (ie, by only asking for DISTINCT results, perhaps).

As written the query cannot be "optimized" because you have written it in such a way that the only possible freedom (choosing whether to put People or Traffic) in the outer loop is eliminated.

As written the only thing that will be of any use is for you to optimize the retrieval path by having an index on Traffic (checkData, name, flow) (though this may be created automatically if it does not exist) so that the correlated query can be done as quickly as possible. Nothing else will help.

Having 7 different indexes on 6 different columns of 2 different tables is also not helpful when doing table lookups that require matching multiple columns simultaneously.

At the CLI prompt you can use the .expert command to get recommendations on indexes.

Similarly, the .eqp command is useful for having the query plan displayed so you can see what is going on, without having to prefix the query with EXPLAIN QUERY PLAN.

As an aside, text constants use single quotes. Double quotes are for identifiers. SQLite3 will allow the use of double-quotes but if what is contained in them is a valid identifier in the context in which it is used, then it will be a reference to the identifier and not a text constant. (leading to whining about queries like SELECT * FROM Person WHERE name="name" returning all rows instead of just the one wished for).

The optimizer (query planner) is not magical. It can only optimize within the constraints that you have declared using information that you have make available to it. It does not (in most cases) re-write the query you have given to a more optimal form in order to allow what may be a more optimal generic solution. Query re-writing is a very complex affair and provable correctness is a very difficult and complex problem. There are various heavyweight products which can do very complex query re-writes and optimization analysis -- SQLite3 is not one of them -- it is for the most part constrained to "do what it is told" in the most efficient manner possible rather than "do what it thinks you meant even though that looks nothing like what you told it" as efficiently as possible.

(5) By bobby31 on 2020-07-30 10:53:59 in reply to 3 [link] [source]

Wow ! You pretty much answered ALL my questions, expressed or otherwise !

I'll have to re-read your answer a few times for everything to sink in.

Initially I thought the query optimizer would be smart enough to find the best way to execute the query, but it seems it is not the case. Can't blame it, of course.

I will need to create the indexes on Traffic (checkDate, name, flow) and on People (name, nick).

Also thanks for the side advice, it is much appreciated.