Help speeding up this query
(1) By jose isaias cabrera (jicman) on 2020-09-10 13:28:40 [link] [source]
Greetings!
Any thoughts on how I can make this query faster? I would really appreciated. Thanks.
SELECT a.*,b.*,c.*
FROM Project_List AS a
LEFT JOIN Project_Dashboard AS b ON a.ProjID = b.ProjID
LEFT JOIN Project_Extras AS c ON a.ProjID = c.ProjID
AND
a.PMO_Board_Report != 'No'
AND
a.Status = 'Acknowledged'
AND
Project_Type != 'Outside IT perimeter'
AND
a.Start_date < date('now')
AND
(
(a.Target_Go_Live_Date > substr(date('now'),1,4) || '-01-15') OR
(a.Target_Go_Live_Date = '' AND a.Finish_Date > substr(date('now'),1,4) || '-01-15')
)
AND
(
b.TimeIndicator = '0'
OR
b.CostIndicator = '0'
)
AND
a.InsertDate =
(SELECT MAX(InsertDate) FROM Project_List WHERE a.ProjID = ProjID)
AND
b.InsertDate =
(SELECT MAX(InsertDate) FROM Project_Dashboard WHERE b.ProjID = ProjID)
AND
c.InsertDate =
(SELECT MAX(InsertDate) FROM Project_Extras WHERE c.ProjID = ProjID)
WHERE a.ProjID = b.ProjID AND a.ProjID = c.ProjID
group BY a.Manager, a.ProjID
ORDER BY a.Manager, a.ProjID
;
This is what I get from the .expert function:
(no new indexes)
MULTI-INDEX OR
INDEX 1
SEARCH TABLE Project_Dashboard AS b USING INDEX PDash_TimeIndicator (TimeIndicator=?)
INDEX 2
SEARCH TABLE Project_Dashboard AS b USING INDEX PDash_CostIndicator (CostIndicator=?)
CORRELATED SCALAR SUBQUERY 2
SEARCH TABLE Project_Dashboard USING COVERING INDEX PDash_ProjID_InsertDate (ProjID=?)
SEARCH TABLE Project_List AS a USING INDEX PL_ProjID_InsertDate (ProjID=?)
CORRELATED SCALAR SUBQUERY 1
SEARCH TABLE Project_List USING COVERING INDEX PL_ProjID_InsertDate (ProjID=?)
SEARCH TABLE Project_Extras AS c USING INDEX sqlite_autoindex_Project_Extras_1 (ProjID=?)
CORRELATED SCALAR SUBQUERY 3
SEARCH TABLE Project_Extras USING COVERING INDEX sqlite_autoindex_Project_Extras_1 (ProjID=?)
USE TEMP B-TREE FOR GROUP BY
josé
(2.1) By Gunter Hick (gunter_hick) on 2020-09-10 14:53:03 edited from 2.0 in reply to 1 [link] [source]
The query is not very useful without the schema it is executed against, but I do have a couple of notes: Using LEFT JOIN does not give the QP any leeway to reorder the tables for faster lookup. Are you writing LEFT JOIN because it is needed for the correct answer or just because you like writing it? Duplicating the JOIN ON expressions in the WHERE clause is superflous, and putting expressions in the JOIN ON that do not impact the JOIN strikes me as strange. The JOIN ON condition should only contain expressions that actually impact the JOIN, and the other conditions stated in the WHERE clause. The SELECT MAX() subqueries use the ProjID field from all three tables while it is already known that they have to be identical. This could cause extra work to extract values that are already known. Choose one (probably a.ProjID) and stick to it. Don't retrieve all columns of all participating tables, choose the ones you really need instead. At least, eliminate the ProjID field from two of the tables, as these will contain identical values. SQLite has an optimization for single MAX() aggregate queries that I suspect applies here. This should allow SELECTing MAX(a.InsertDate), MAX(b.InsertDate), MAX(c.InsertDate) from the join and dropping the SELECT MAX(InsertDate) subqueries. Edit: Try ordering the constraints pertaingin to each table so that the most restrictive comes first, thus saving the effort of checking the less restrictive constraints.
(3) By jose isaias cabrera (jicman) on 2020-09-10 16:48:51 in reply to 2.1 [link] [source]
Thanks Gunter. I can fix, at least, 3 of your suggestions because I understand them. :-) I may come back for more later. :-) Gracias.
josé
(4) By Keith Medcalf (kmedcalf) on 2020-09-10 18:52:02 in reply to 3 [link] [source]
The conditions in the WHERE clause make the LEFT superfluous and the ON conditions duplicitous.
Remove the word LEFT and the entire ON conditions (the duplicate conditions), change the first AND to WHERE and the last WHERE to AND.
SELECT a.*,b.*,c.*
FROM Project_List AS a
JOIN Project_Dashboard AS b
JOIN Project_Extras AS c
WHERE
a.PMO_Board_Report != 'No'
AND
a.Status = 'Acknowledged'
AND
Project_Type != 'Outside IT perimeter'
AND
a.Start_date < date('now')
AND
(
(a.Target_Go_Live_Date > substr(date('now'),1,4) || '-01-15') OR
(a.Target_Go_Live_Date = '' AND a.Finish_Date > substr(date('now'),1,4) || '-01-15')
)
AND
(
b.TimeIndicator = '0'
OR
b.CostIndicator = '0'
)
AND
a.InsertDate =
(SELECT MAX(InsertDate) FROM Project_List WHERE a.ProjID = ProjID)
AND
b.InsertDate =
(SELECT MAX(InsertDate) FROM Project_Dashboard WHERE b.ProjID = ProjID)
AND
c.InsertDate =
(SELECT MAX(InsertDate) FROM Project_Extras WHERE c.ProjID = ProjID)
AND a.ProjID = b.ProjID
AND a.ProjID = c.ProjID
group BY a.Manager, a.ProjID
ORDER BY a.Manager, a.ProjID
;
Whats killing you is the three correlated subqueries. You need to generate those into a materialized view first and then use that in the query. Eg:
with maxes(ProjID, pl_insert, pd_insert, pe_insert)
as (
select projid,
max(InsertDate),
(select max(insertdate) from project_dashboard where ProjID = e.projID),
(select max(insertdate) from project_extras where projid = e.projid)
from project list e
group by projid
)
SELECT a.*,b.*,c.*
FROM Project_List AS a
JOIN Project_Dashboard AS b
JOIN Project_Extras AS c
JOIN maxes
WHERE
a.PMO_Board_Report != 'No'
AND
a.Status = 'Acknowledged'
AND
Project_Type != 'Outside IT perimeter'
AND
a.Start_date < date('now')
AND
(
(a.Target_Go_Live_Date > substr(date('now'),1,4) || '-01-15') OR
(a.Target_Go_Live_Date = '' AND a.Finish_Date > substr(date('now'),1,4) || '-01-15')
)
AND
(
b.TimeIndicator = '0'
OR
b.CostIndicator = '0'
)
AND
a.InsertDate = maxes.pl_insert
AND
b.InsertDate = maxes.pd_insert
AND
c.InsertDate = maxes.pe_insert
AND a.ProjID = b.ProjID
AND a.ProjID = c.ProjID
AND a.projid = maxes.projid
group BY a.Manager, a.ProjID
ORDER BY a.Manager, a.ProjID
;
And then you need to "fiddle" to make sure that the subquery "maxes" is getting materialized (if it does not get materialized as is, then maybe making it uselessly DISTINCT (as in SELECT DISTINCT ...) will do it.
(5) By anonymous on 2020-09-10 18:59:43 in reply to 1 [link] [source]
Just curious, how long does the query take now?
(6) By jose isaias cabrera (jicman) on 2020-09-10 19:41:27 in reply to 4 [source]
You are amazing, Keith.
Run Time: real 1.275 user 0.062500 sys 0.312500
versus the old query:
Run Time: real 15.901 user 0.390625 sys 2.125000
It will take me years to understand what you did. Well, I actually understand it. But to be able to come up with it. I am actually going to study it because I have other queries that need the same fix. Thanks.
josé
(7.1) By Keith Medcalf (kmedcalf) on 2020-09-10 23:26:09 edited from 7.0 in reply to 6 [link] [source]
For this part of the query:
with maxes(ProjID, pl_insert, pd_insert, pe_insert)
as (
select projid,
max(InsertDate),
(select max(insertdate) from project_dashboard where ProjID = e.projID),
(select max(insertdate) from project_extras where projid = e.projid)
from project_list e
group by projid
)
select *
from maxes
See if that is what is taking the greatest part of the time. You may be able to speed up the query even more by optimizing this part which basically generates the set of candidates based on the (projid, insertdate) for each of the other tables.
The remaining conditions in the original query are basically just "culling and sorting" those candidates.
(8) By jose isaias cabrera (jicman) on 2020-09-11 02:49:31 in reply to 5 [link] [source]
Just curious, how long does the query take now?
The original query took,
Run Time: real 15.901 user 0.390625 sys 2.125000
or longer, and the fixed query where Keith was instrumental with his fix, took,
Run Time: real 1.275 user 0.062500 sys 0.312500
Thanks.
(9) By jose isaias cabrera (jicman) on 2020-09-11 02:51:09 in reply to 7.1 [link] [source]
I am happy with the 2 seconds or less result. Thanks.
josé