SQLite Forum

Slow query (~20x faster when run as two separate queries)
Login

Slow query (~20x faster when run as two separate queries)

(1) By Izkata on 2020-03-17 03:53:10 [link] [source]

As requested over on Hacker News.

Overview

CREATE TABLE ChargingRates ( Age INTEGER NOT NULL, Type INTEGER NOT NULL, FromPercent REAL NOT NULL, ToPercent REAL NOT NULL, Duration INTEGER NOT NULL, Recorded TEXT DEFAULT (strftime('%Y-%m-%d %H:%M:%f', 'now', 'localtime')));

High-level description (because hell, there's probably a better way to do this anyway): The table tracks seconds to charge from one percent to another, and this query result is how long it takes to get from the current percent to every percent above up to 100%. The one below is using 31% as the "current" percent. It uses average across all recorded data ("Level0") as a fallback when one percent to the next hasn't been recorded ("Level1").

The inner query (the first of the two in the split-out version) finds duration from one percent to the next, then the outer part sums them up to get cumulative duration.

Combined query

This takes about 0.13s

WITH Durations(FromP, ToP, Seconds) AS (
   SELECT .31, .31, 0
   UNION ALL
   SELECT Durations.FromP, DurationSums.ToP, DurationSums.Duration + Durations.Seconds
      FROM (
         WITH Percentages(P) AS (
            SELECT .31
            UNION ALL
            SELECT ROUND(P + .01, 2) FROM Percentages WHERE P < 1.01
         )
         SELECT
            P AS FromP,
            CASE WHEN ToPercent IS NOT NULL
               THEN ToPercent
               ELSE ROUND(P + .01, 2)
            END AS ToP,
            CASE WHEN ChargingRatesLevel1.Duration IS NOT NULL
               THEN (ToPercent - P) / (ToPercent - FromPercent) * ChargingRatesLevel1.Duration
               ELSE ChargingRatesLevel0.Duration
            END AS Duration
         FROM Percentages
         LEFT JOIN (
            SELECT 1 AS Level, FromPercent AS FromPercent, ToPercent AS ToPercent, AVG(Duration) AS Duration FROM ChargingRates WHERE Type = 1 GROUP BY FromPercent, ToPercent
         ) ChargingRatesLevel1 ON (ChargingRatesLevel1.FromPercent <= P AND ChargingRatesLevel1.ToPercent > P)
         LEFT JOIN (
            SELECT 0 AS Level, AVG(Duration) AS Duration FROM ChargingRates WHERE Type = 1
         ) ChargingRatesLevel0
         WHERE FromP IS NULL OR ToPercent IS NULL OR FromP <> ToPercent
      ) DurationSums
      INNER JOIN Durations ON (Durations.ToP = DurationSums.FromP)
)
SELECT * FROM Durations WHERE ToP <= 1 ORDER BY ToP ASC;

With this query plan

6|0|0|SCAN TABLE Percentages
4|0|0|COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
7|0|0|SCAN TABLE ChargingRates
7|0|0|USE TEMP B-TREE FOR GROUP BY
8|0|0|SCAN TABLE ChargingRates
3|0|0|SCAN SUBQUERY 4
3|1|1|SCAN SUBQUERY 7 AS ChargingRatesLevel1
3|2|2|SCAN SUBQUERY 8 AS ChargingRatesLevel0
3|3|3|SCAN TABLE Durations
1|0|0|COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR ORDER BY

Two queries

Straight copy/pasting that inner query comes out to around .004s

WITH Percentages(P) AS (
   SELECT .31
   UNION ALL
   SELECT ROUND(P + .01, 2) FROM Percentages WHERE P < 1.01
)
SELECT
   P AS FromP,
   CASE WHEN ToPercent IS NOT NULL
      THEN ToPercent
      ELSE ROUND(P + .01, 2)
   END AS ToP,
   CASE WHEN ChargingRatesLevel1.Duration IS NOT NULL
      THEN (ToPercent - P) / (ToPercent - FromPercent) * ChargingRatesLevel1.Duration
      ELSE ChargingRatesLevel0.Duration
   END AS Duration
FROM Percentages
LEFT JOIN (
   SELECT 1 AS Level, FromPercent AS FromPercent, ToPercent AS ToPercent, AVG(Duration) AS Duration FROM ChargingRates WHERE Type = 1 GROUP BY FromPercent, ToPercent
) ChargingRatesLevel1 ON (ChargingRatesLevel1.FromPercent <= P AND ChargingRatesLevel1.ToPercent > P)
LEFT JOIN (
   SELECT 0 AS Level, AVG(Duration) AS Duration FROM ChargingRates WHERE Type = 1
) ChargingRatesLevel0
WHERE FromP IS NULL OR ToPercent IS NULL OR FromP <> ToPercent;

3|0|0|SCAN TABLE Percentages
1|0|0|COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
4|0|0|SCAN TABLE ChargingRates
4|0|0|USE TEMP B-TREE FOR GROUP BY
5|0|0|SCAN TABLE ChargingRates
0|0|0|SCAN SUBQUERY 1
0|1|1|SCAN SUBQUERY 4 AS ChargingRatesLevel1
0|2|2|SCAN SUBQUERY 5 AS ChargingRatesLevel0

And putting those results into a temporary table, the outer query is only about 0.002s

WITH Durations(FromP, ToP, Seconds) AS (
   SELECT .31, .31, 0
   UNION ALL
   SELECT Durations.FromP, DurationSums.ToP, DurationSums.Duration + Durations.Seconds
      FROM DurationSums
      INNER JOIN Durations ON (Durations.ToP = DurationSums.FromP)
)
SELECT * FROM Durations WHERE ToP <= 1 ORDER BY ToP ASC;


3|0|1|SCAN TABLE Durations
3|1|0|SEARCH TABLE DurationSums USING AUTOMATIC COVERING INDEX (FromP=?)
1|0|0|COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR ORDER BY

My understanding is that the AUTOMATIC COVERING INDEX not being created is why the combined version is slow.

sqlite 3.22.0 on Ubuntu 18.04

(2) By Keith Medcalf (kmedcalf) on 2020-03-17 06:00:47 in reply to 1 [link] [source]

A couple of things to do.
(1) at the CLI enter the command
.expert
(2) run your query
(3) create the recommended indexes
(4) run ANALYZE
(5) if you added any new indexes then goto step (1)
(6) rerun your query

Is there any difference now?

(3) By Izkata on 2020-03-18 02:51:25 in reply to 2 [source]

The recommended indexes were:

CREATE INDEX ChargingRates_idx_000119d2 ON ChargingRates(Type);
CREATE INDEX ChargingRates_idx_99a2146b ON ChargingRates(Type, FromPercent, ToPercent);

Query plan for the combined query after adding them:

--EQP-- 6,0,0,SCAN TABLE Percentages
--EQP-- 4,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
--EQP-- 7,0,0,SEARCH TABLE ChargingRates USING INDEX ChargingRates_idx_99a2146b (Type=?)
--EQP-- 8,0,0,SEARCH TABLE ChargingRates USING INDEX ChargingRates_idx_000119d2 (Type=?)
--EQP-- 3,0,0,SCAN SUBQUERY 4
--EQP-- 3,1,1,SCAN SUBQUERY 7 AS ChargingRatesLevel1
--EQP-- 3,2,2,SCAN SUBQUERY 8 AS ChargingRatesLevel0
--EQP-- 3,3,3,SCAN TABLE Durations
--EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
--EQP-- 0,0,0,SCAN SUBQUERY 1
--EQP-- 0,0,0,USE TEMP B-TREE FOR ORDER BY

The run time seems to fluctuate more, but the average still floats around 0.13s, and the separated ones about 0.005s + 0.002s, with these query plans:

--EQP-- 3,0,0,SCAN TABLE Percentages
--EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
--EQP-- 4,0,0,SEARCH TABLE ChargingRates USING INDEX ChargingRates_idx_99a2146b (Type=?)
--EQP-- 5,0,0,SEARCH TABLE ChargingRates USING INDEX ChargingRates_idx_000119d2 (Type=?)
--EQP-- 0,0,0,SCAN SUBQUERY 1
--EQP-- 0,1,1,SCAN SUBQUERY 4 AS ChargingRatesLevel1
--EQP-- 0,2,2,SCAN SUBQUERY 5 AS ChargingRatesLevel0
--EQP-- 3,0,1,SCAN TABLE Durations
--EQP-- 3,1,0,SEARCH TABLE DurationSums USING AUTOMATIC COVERING INDEX (FromP=?)
--EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
--EQP-- 0,0,0,SCAN SUBQUERY 1
--EQP-- 0,0,0,USE TEMP B-TREE FOR ORDER BY