Poor optimization for query after 3.38.5

(1) By Grunthos on 2023-01-30 13:33:32 [source]

Using the CLI, this query ran reasonably fast (a few seconds) in 3.38.5 and runs for hours in 3.39.0 and 3.40.1. I have tried building sqlite3 with exactly the same settings in each, just to make sure it was not a setting issue.

The underlying problem seems to be a difference in query plan. The 3.38.5 query plan is:

QUERY PLAN
|--MATERIALIZE sums
|  |--MATERIALIZE src
|  |  |--MATERIALIZE init
|  |  |  `--SCAN summary USING INDEX summary_ix
|  |  |--SCAN i
|  |  |--SEARCH summary USING COVERING INDEX summary_ix (country=? AND date>?)
|  |  `--USE TEMP B-TREE FOR ORDER BY
|  |--SCAN src
|  |--SEARCH summary USING INDEX summary_ix (country=? AND date>? AND date<?)
|  `--USE TEMP B-TREE FOR GROUP BY
|--SCAN sums
|--SEARCH sums USING AUTOMATIC COVERING INDEX (country=? AND date=?)
|--SEARCH sums USING AUTOMATIC COVERING INDEX (country=? AND date=?)
|--SEARCH sums USING AUTOMATIC COVERING INDEX (country=? AND date=?)
`--SEARCH i USING AUTOMATIC COVERING INDEX (country=?)

And the 3.40.1 query plan is:

QUERY PLAN
|--MATERIALIZE mult
|  |--MATERIALIZE sums
|  |  |--MATERIALIZE vals
|  |  |  |--MATERIALIZE src
|  |  |  |  |--MATERIALIZE raw
|  |  |  |  |  `--SCAN summary
|  |  |  |  |--MATERIALIZE init
|  |  |  |  |  |--SCAN raw
|  |  |  |  |  `--USE TEMP B-TREE FOR GROUP BY
|  |  |  |  |--SCAN raw
|  |  |  |  `--SEARCH i USING AUTOMATIC COVERING INDEX (country=?)
|  |  |  |--SCAN raw
|  |  |  `--SEARCH src USING AUTOMATIC COVERING INDEX (country=?)
|  |  |--SCAN vals
|  |  `--USE TEMP B-TREE FOR GROUP BY
|  `--SCAN sums
|--SCAN mult
|--SCAN sums
|--SEARCH mult USING AUTOMATIC COVERING INDEX (country=? AND date=?)
|--SEARCH sums USING AUTOMATIC COVERING INDEX (country=? AND date=?)
`--SEARCH i USING AUTOMATIC COVERING INDEX (country=?)

The query is:

with recursive
        -- Raw data (so reference is in one place)
        raw as (select country, date, total_c as total, delta_c_7d as delta from summary),

        -- Everything below this point is generic for delta 7d stuff
        -- Find the country and min/max date
        init(country, date, fin) as (select country, min(date), max(date) from raw where  total > 0 group by country),
        -- Generate the date stream for each country
        src(country, date) as (select raw.country, raw.date
            from raw join init i on raw.country = i.country and raw.date > i.date
            order by raw.country, raw.date),
        -- Generate the x & y for each entry in the country/date stream
        vals(country, date, x, y) as (select src.country, src.date, julianday(raw.date) - julianday(src.date), log(delta+1)
            from src join raw on raw.country = src.country and raw.date > date(src.date,'-7 days') and raw.date <= src.date and delta >= 0),

        -- Accumulate the data we need
        sums(country, date, x2, x, n, xy, y) as (select country, date, sum(x*x*1.0), sum(x*1.0), sum(1.0), sum(x*y*1.0), sum(y*1.0) from vals group by 1, 2),
        -- use these to calculate to divisor for the inverse matrix
        mult(country, date, m) as (select country, date, 1.0/(x2 * n - x * x) from sums),
        -- Build the inverse matrix
        inv(country, date, a,b,c,d) as (select mult.country, mult.date, n * m, -x * m, -x * m, x2 * m
            from mult join sums on sums.country=mult.country and mult.date=sums.date),
        -- Calculate the coefficients for the least squares fit
        fit(country, date, a, b) as (select inv.country, inv.date, a * xy + b * y, c * xy + d * y
            from inv
            join mult on mult.country = inv.country and mult.date = inv.date
            join sums on sums.country = mult.country and sums.date = mult.date)
    select *, nFin/nPrev - 1 as growth, log(2)/log(nFin/nPrev) as doubling from (
        select f.*, exp(b) - 1 as nFin, exp(a* (-1) + b) - 1 as nPrev
            from fit f join init i on i.country = f.country and f.date <= date(i.fin,'-3 days')
            ) where nPrev > 0 and nFin > 0;

The table definition is:

CREATE TABLE Summary(
  pos,
  country,
  date,
  pct_c,
  total_c,
  delta_c,
  log_c,
  pct_d,
  total_d,
  delta_d,
  log_d,
  pct_r,
  total_r,
  delta_r,
  log_r,
  pct_t,
  total_t,
  delta_t,
  log_t,
  pct_hosp,
  total_hosp,
  delta_hosp,
  log_hosp,
  pct_icu,
  total_icu,
  delta_icu,
  log_icu,
  d_7day,
  d_7day_lo,
  d_7day_hi,
  log_d_7day_lo,
  log_d_7day_hi,
  info,
  ascii,
  total_h_7d,
  total_i_7d,
  delta_d_3d,
  delta_c_3d,
  delta_d_7d,
  delta_c_7d,
  delta_h_7d,
  delta_i_7d,
  delta_d_9d,
  delta_c_9d
);
CREATE UNIQUE INDEX summary_ix on summary(country, date);

The table has approximately 337,000 rows and I have has a look through the release notes and can see nothing obvious to explain the change.

FWIW, the recursive query is just doing basic maths on the underlying table to perform a rolling least squares fit of the data.

(2) By David Raymond (dvdraymond) on 2023-01-30 14:48:21 in reply to 1 [link] [source]

This doesn't really help answer the question, just a nitpick that there is no actual "recursive" part of this query. No table in your chain of CTE's references itself, only others that were defined earlier in the chain.

(3) By Richard Hipp (drh) on 2023-01-30 15:58:54 in reply to 1 [link] [source]

Please apply this patch:

--- src/where.c
+++ src/where.c
@@ -3590,17 +3590,17 @@
         ** of X is smaller for views and subqueries so that the query planner
         ** will be more aggressive about generating automatic indexes for
         ** those objects, since there is no opportunity to add schema
         ** indexes on subqueries and views. */
         pNew->rSetup = rLogSize + rSize;
         if( !IsView(pTab) && (pTab->tabFlags & TF_Ephemeral)==0 ){
           pNew->rSetup += 28;
         }else{
-          pNew->rSetup -= 10;
+          pNew->rSetup -= 21;
         }
         ApplyCostMultiplier(pNew->rSetup, pTab->costMult);
         if( pNew->rSetup<0 ) pNew->rSetup = 0;
         /* TUNING: Each index lookup yields 20 rows in the table.  This
         ** is more than the usual guess of 10 rows, since we have no way
         ** of knowing how selective the index will ultimately be.  It would
         ** not be unreasonable to make this value much larger. */
         pNew->nOut = 43;  assert( 43==sqlite3LogEst(20) );

Rerun your query and let me know the timings:

Timing for version 3.38.5
Timing for version 3.40.0 or later
Timing for version 3.40.0 or later plus the patch above.

(4) By Richard Hipp (drh) on 2023-01-30 20:47:22 in reply to 3 [link] [source]

Update to the above.....

I created random test data and was able to confirm that the patch seems to clear the problem without causing any other regressions, at least not in our test data. The change has been checked in.

(5) By Grunthos on 2023-01-31 05:57:04 in reply to 3 [link] [source]

Thanks for the quick reply!

    13.1 seconds - 3.38.5 
12,488   seconds - 3.40.1 Unpatched (3hr 28m 8s)
   130.3 seconds - 3.40.1 Patched

So...a lot better, but 10 times worse than 3.38.5

(7) By Richard Hipp (drh) on 2023-01-31 12:51:57 in reply to 5 [link] [source]

On the random data I generated, I got 3.40.1 patched as being faster than 3.38.5. The difference must be in the data.

Please send me your data so that I can run experiments. You can reach me directly at drh at sqlite dot org to arrange transfer of data that you do not want to make public.

(6.2) By Grunthos on 2023-01-31 06:39:08 edited from 6.1 in reply to 3 [link] [source]

I should have added the query plan: It still does not use the index at all (and the date range stuff is a pretty tight limit):

Patched 3.40.1 query plan:

QUERY PLAN
|--MATERIALIZE mult
|  |--MATERIALIZE sums
|  |  |--MATERIALIZE vals
|  |  |  |--MATERIALIZE src
|  |  |  |  |--MATERIALIZE raw
|  |  |  |  |  `--SCAN summary
|  |  |  |  |--MATERIALIZE init
|  |  |  |  |  |--SCAN raw
|  |  |  |  |  `--USE TEMP B-TREE FOR GROUP BY
|  |  |  |  |--SCAN i
|  |  |  |  `--SEARCH raw USING AUTOMATIC COVERING INDEX (country=?)
|  |  |  |--SCAN src
|  |  |  `--SEARCH raw USING AUTOMATIC PARTIAL COVERING INDEX (country=?)
|  |  |--SCAN vals
|  |  `--USE TEMP B-TREE FOR GROUP BY
|  `--SCAN sums
|--SCAN mult
|--SEARCH sums USING AUTOMATIC COVERING INDEX (country=? AND date=?)
|--SEARCH mult USING AUTOMATIC COVERING INDEX (country=? AND date=?)
|--SEARCH sums USING AUTOMATIC COVERING INDEX (country=? AND date=?)
`--SEARCH i USING AUTOMATIC COVERING INDEX (country=?)

(8) By Richard Hipp (drh) on 2023-01-31 13:02:09 in reply to 6.2 [link] [source]

Please also try rewriting the "fit" subquery as follows:

  -- Calculate the coefficients for the least squares fit
  fit(country, date, a, b) AS (
    SELECT inv.country, inv.date, a * xy + b * y, c * xy + d * y
      FROM sums, mult, inv
     WHERE sums.country = mult.country AND sums.date = mult.date
       AND mult.country = inv.country AND mult.date = inv.date
  )

For my random test data, this "fit" rewrite runs about twice as fast on an unpatched 3.40.1 as it does under 3.38.5.

(9.7) By Grunthos on 2023-02-01 11:36:46 edited from 9.6 in reply to 8 [link] [source]

Revised times:

Version	Run time	Query
Unpatched 3.40.1	2m 11s (131s)	revised query
3.38.5	15.3 seconds	revised query
3.38.5	13.3 seconds	original query

(10) By Grunthos on 2023-02-01 11:27:53 in reply to 8 [link] [source]

Would you like me to send the data? Compressed it's about 16MB (in xz).

(11) By Richard Hipp (drh) on 2023-02-01 15:56:31 in reply to 1 [link] [source]

Note to the public: Gurnthos sent me the database via a private side channel.

I think this issue has now been resolved, as of check-in 66f29c403d28630b. The query should be fast again using the latest trunk check-in, or using the "Prerelease Snapshot" on the Download page. In my tests, the latest trunk is a little faster (though not dramatically so) than 3.38.5.

Please report back if you see otherwise.

This issue arose coincident with the addition of MATERIALIZED and NOT MATERIALIZED keywords for CTEs. When a CTE is tagged as MATERIALIZED, then it is always materialized as-is and is not subject to the query flattening optimization. This is a way of overriding the query optimizer. I copied this from PostgreSQL.

However, if a CTE was being used more than once, I was marking it as if it had had the MATERIALIZED keyword, to prevent it from being implemented using a co-routine. But this had the adverse side effect of disabling the query flattening optimization for the CTE. Without query flattening, the query ran more slowly.

The fix was enhance the function that decides whether or not to implement a CTE as a co-routine so that it became unnecessary to tag such CTEs as if they had the MATERIALIZED keyword. Now the query flattener optimization is permitted and the overall query is fast again.

Would that all query planner fixes were this simple...

(12) By Grunthos on 2023-02-02 03:43:37 in reply to 11 [link] [source]

Wonderful work, wonderful explanation. Also, wonderful result: It is indeed slightly faster than 3.38.5!

Thank you for your assistance in this.

(13) By Grunthos on 2023-02-02 05:34:22 in reply to 11 [link] [source]

As due diligence, I compared the output and it seems to produce differing results in the final digits for many rows, but for some rows the digits are significantly different (I used printf("%.4e") for all fp values to look for gross differences).

I am still investigating the reasons (probably rounding, maybe algorithm).

(14) By Spindrift (spindrift) on 2023-02-02 07:55:24 in reply to 13 [link] [source]

You haven't specified any ordering, so that's not particularly surprising.

Have you tried the original setup but specifying

PRAGMA reverse_unordered_selects = 1;

just to make sure you aren't implicitly relying on a certain order of results?

(15) By Grunthos on 2023-02-02 13:06:52 in reply to 14 [link] [source]

I just tried that, but did add an 'order by' to the final select (to make comparison easier), and the numbers were identical in reversed/non-reversed runs using 3.38.5. Ditto the reversed/non-reversed runs using the snapshot.

Comparing these (sorted) results, what I'm seeing are near-zero values in the snapshot version, vs actual zeroes in 3.38.5.

It is (I think) my logic: For example, in the new version, for one row, I get a value for 'a' of -2.22044604925031e-16, whereas in 3.38.5, it comes out as exactly 0. This results in a slight discrepancy between 'nFin' and 'nPrev' in the new version, which means that 'growth' is a tiny negative value (-2.22044604925031e-16), which results in 'doubling' being -3.12165738408268e+15. Whereas in 3.38.5, they are 0 and NaN, respectively.

I am still looking into exact causes, but this almost-zero values occur in exactly the same places in 3.38.5 and 3.40.1, but in different places in the snapshot, so I believe it is just a minor artifact.

(16) By Cam DBA (chris.brogan) on 2023-02-08 18:29:51 in reply to 11 [link] [source]

Our product uses a View created with a complex, multi-part CTE.  Its performance appears to be  affected by the optimization issues discussed in this thread.

Through detailed benchmark testing, using the CTE as a control,  we have observed the following performance inconsistencies between these SQLite versions:

v3.37.2 - Poor performance observed when using the View.
v3.38.5 - Poor performance observed when using the View.
v3.39.0 - Good performance observed for both the CTE and View.
v3.39.4 - Good performance observed for both the CTE and View.
v3.40.0 - Poor performance observed for both the CTE and View.
v3.40.1 - Poor performance observed for both the CTE and View. 
v3.41.0 - Great performance observed for both the CTE and View.

For v3.41.0, the performance for both the CTE and the View was MUCH better than even observed in v3.39.x

Notes:
    * We obtained v3.41.0 via the Pre-Release download called "sqlite-snapshot-202302011541.tar.gz"
    * Detailed test results, methods and scripts can be provided to SQLite staff if needed.

Our hope is that the changes made for v3.41.0 are soon available as a formal release and stabilized in future versions.