SQLite User Forum

Regression from 3.46.1 traced to multiple joins with views
Login

Regression from 3.46.1 traced to multiple joins with views

(1.1) Originally by Balaji Ramanathan (balaji) with edits by Richard Hipp (drh) on 2025-01-17 13:16:57 from 1.0 [link] [source]

I reported a regression in SQLite query performance between 3.46.1 and 3.47 in this forum post(https://sqlite.org/forum/forumpost/1e25669426). I have done some more testing, and the regression seems to impact specifically queries where there are multiple inner joins with tables (that have indexes on the joined columns), and views (which obviously don't have indexes).

If you look at the example database provided earlier (https://drive.google.com/file/d/1mEEK6tBC9UkbQYJ89p5tJo3NktpR1CkZ/view?usp=sharing), there is a view called TripDetailsRaw, which is simply a bunch of inner joins involving various tables and views.

3.46.1 runs this query in well under a second. 3.47.2 takes about 45 seconds to run this query.

I have verified that all the individual views and tables involved in the joins can be selected from in microseconds, so it is only the final join between all of them that seems to have become much slower because of some change between 3.46.1 and 3.47.

Is there anything else you can suggest to make this easier to investigate? Thank you.

(2) By Mike Castle (nexushoratio) on 2025-01-03 23:41:07 in reply to 1.0 [link] [source]

Random thought, but, considering other recent threads regarding views in the last couple of days:

Does tossing a random MATERIALIZE on any/all of the views help in any way?

(3) By Balaji Ramanathan (balaji) on 2025-01-04 16:47:48 in reply to 2 [link] [source]

Is there a way to materialize views? I thought the MATERIALIZED keyword was only for CTEs. The syntax diagram for creating a view does not have any mention of an option to materialize the view (https://sqlite.org/lang_createview.html).

(4) By Mike Castle (nexushoratio) on 2025-01-04 17:20:46 in reply to 3 [link] [source]

I'm sorry, I was wrong.

I conflated the recent forum discussion regarding CTE performance issues and my own unrelated research at the same time on VIEWS.

Still, if I understand things correctly, both VIEWs and CTEs are "simply" ways of making it easier to write complex or reusable queries. And the recent discussion demonstrated that use of the MATERIALIZE keyword "hint" for CTEs can impact performance (both positively and negatively).

So, while it may not work on VIEWs directly, maybe some experiments with rewriting it as a CTE and playing with MATERIALIZE might yield some interesting information?

(5) By Max (Maxulite) on 2025-01-05 11:24:57 in reply to 1.0 [link] [source]

I don't know whether the developers have time for exploring the problem in details. I played the the db a little and narrowed it down to a smaller subquery of the view in question.

select *
from Trip
inner join TripCost on Trip.TripID = TripCost.TripID
inner join TripCalculatedValues on Trip.TripID = TripCalculatedValues.TripID
inner join AllTripNotes on Trip.TripID = AllTripNotes.TripID
inner join AllTripGroups on Trip.TripID = AllTripGroups.TripID
inner join AllTripGroupTripNotes on Trip.TripID = AllTripGroupTripNotes.TripID

v. 3.47.2 spends more than a minute and reads more than 15GB of data (the db is 6MB). Changing the order of the joins sometimes drops the traffic to megabytes, but affects the time just a little.

v. 3.46.0 spends about a second and reads about 4MB

As for the full select inside TripDetailsRaw, to trigger good/bad times one can leave about 9-10 joins in different parts of the body and temporary commmenting-uncommenting one of them

(6) By Richard Hipp (drh) on 2025-01-19 19:53:25 in reply to 1.1 [link] [source]

It appears that version 3.46.1 is the only version that runs the query quickly. All prior versions (3.46.0 and before) are slow and all subsequent versions (3.47.0 and later) are slow. Would I be correct to guess that you originally developed this query using SQLite 3.46.1 and that you never tested it on any other version of SQLite prior to upgrading to 3.47.0?

(8) By Max (Maxulite) on 2025-01-19 22:59:53 in reply to 6 [link] [source]

Not the OP, but I'd like to point out that the query [select * from TripDetailsRaw] runs quickly in 3.39.4, 3.43.0 (randomly chosen dlls) so probably there are other versions. The same is for the narrowed body of the TripDetailsRaw view I posted in another reply.

(9) By Max (Maxulite) on 2025-01-20 11:10:14 in reply to 6 [link] [source]

Some additional information as I see it with the OP original base

What I call a quick query maybe depends on the media (SSD vs rotating) and caching, anyway it's when it is executed from 1 second to 5 seconds. The slow one is the one that takes more than a minute to execute.

My bisects from the public versions point to 3.47.0 as the slowing culprit (3.46.1 is the quick one). The view, the body of the view and many queries with dimension tables lead to the slow one. The release page of the version is impressive but I'd say that it may have something to do with "Adjustments to the query planner so that it produces better plans for star queries with a large number of dimension tables." because many experiments I described earlier affected the execution time when the number of joining tables were around 10-11

(10.1) By Max (Maxulite) on 2025-01-20 12:52:15 edited from 10.0 in reply to 6 [link] [source]

Searching in the timeline before 47.0 release date the following check-in (Use a heuristic in the query planner to help it better cope with large star schema queries) looked suspicious and reverting to the previous logic made the queries run quickly again (so (select * from TripDetailsRaw) was slow in 3.47.0 and changed to quick with this only change)

 
 mxChoice = (nLoop<=1) ? 1 : (nLoop==2 ? 5 : 10);
 /* 
  if( nLoop<=1 ){
    mxChoice = 1;
  }else if( nLoop==2 ){
    mxChoice = 5;
  }else{
    mxChoice = computeMxChoice(pWInfo, nRowEst);
  }
  */ 
  

I didn't revert other small changes, I hope they are not relevant. So probably computeMxChoice makes not-optimal decisions for queries like OP's

(11) By Richard Hipp (drh) on 2025-01-20 13:00:03 in reply to 10.1 [link] [source]

So do you get good performance if you port that change to the latest trunk check-in? I do not. Did I do something wrong?

Note also that making that change speeds up OP's query, but slows down a bunch of others. I need a fix that makes all queries fast.

(12.1) By Max (Maxulite) on 2025-01-20 15:56:27 edited from 12.0 in reply to 11 [source]

I'm not sure about the trunk, could not make amalgamation in Windows (just in case the details below). With 3.48.0 the patch works in my case. Could it be that your test base already acquire some ANALYZE knowledge affecting the outcome?

As for amalgamation, making it in Windows/msys2 used to work, I followed my own instructions from the past, but the configure script complains about "C:/msys64/mingw32/bin/tclsh is not a file" when there's actual "C:\msys64\mingw32\bin\tclsh.exe" file present

(13) By Stephan Beal (stephan) on 2025-01-20 16:09:38 in reply to 12.1 [link] [source]

I followed my own instructions from the past, but the configure script complains about "C:/msys64/mingw32/bin/tclsh is not a file" when there's actual "C:msys64mingw32bintclsh.exe" file present

That's an over-zealous check which will be fixed and checked in momentarily. Here's a patch in case you want to work around that without having to wait on the next release:

Index: auto.def
==================================================================
--- auto.def
+++ auto.def
@@ -613,13 +613,11 @@
 
   set doConfigLookup 1 ; # set to 0 to test the tclConfig.sh-not-found cases
   if {"" ne $with_tclsh} {
     # --with-tclsh was provided or found above. Validate it and use it
     # to trump any value passed via --with-tcl=DIR.
-    if {![file isfile $with_tclsh]} {
-      proj-fatal "TCL shell $with_tclsh is not a file"
-    } elseif {![file-isexec $with_tclsh]} {
+    if {![file-isexec $with_tclsh]} {
       proj-fatal "TCL shell $with_tclsh is not executable"
     } else {
       define TCLSH_CMD $with_tclsh
       #msg-result "Using tclsh: $with_tclsh"
     }

The removed part does not account for implicit ".exe" extensions on Windows but the "file-isexec" check does.

(The patch for trunk is functionally the same but in a different file - that content has been moved around since the 3.48 release.)

(14) By Max (Maxulite) on 2025-01-20 18:12:34 in reply to 13 [link] [source]

Thanks, partially worked, but when I applied the patch to sqlite-config.tcl (for auto.def everything was different, no such texts, line numbers, etc). Probably what I meant by "different file". The script advanced but stopped at

WARNING: C:/msys64/mingw32/bin/tclsh is unable to recommend a tclConfig.sh
Error: couldn't execute "C:\msys64\home\user\sqlite\tool\tclConfigShToAutoDef.sh": no such file or
directory

but tclConfigShToAutoDef.sh is there

(15) By Stephan Beal (stephan) on 2025-01-20 18:22:14 in reply to 14 [link] [source]

Probably what I meant by "different file".

Exactly - that stuff was moved yesterday in prep for an upcoming change.

but tclConfigShToAutoDef.sh is there

That... is a mystery and don't currently have a Windows machine to explore it. Just coincidentally, as of today we're working on getting me access to one, so i should be able to reproduce and resolve this sometime within the next week or so.

Hypothetically, though...

What happens if you change this part:

  eval [exec "$srcdir/tool/tclConfigShToAutoDef.sh" "$cfg"]

to:

  eval [exec /bin/sh "$srcdir/tool/tclConfigShToAutoDef.sh" "$cfg"]

My suspicion is that that will work (it does here in Unix-land but the /bin/sh is superfluous here).

(16) By Max (Maxulite) on 2025-01-20 19:25:17 in reply to 15 [link] [source]

Thanks, Stephan, somehow it helped to make things better with your variant slightly modified

  eval [exec sh "$srcdir/tool/tclConfigShToAutoDef.sh" "$cfg"]

configure finished. But now make sqlite3.c

eval 
C:/msys64/home/user/sqlite/tool/mksqlite3c.tcl:92: Error: format string too long or invalid time
Traceback (most recent call last):
  File "C:/msys64/home/user/sqlite/tool/mksqlite3c.tcl", line 92, in fconfigure
    clock format -129 -format {%Y-%m-%d %H:%M:%S UTC} -gmt 1
make: *** [C:/msys64/home/user/sqlite/main.mk:1080: sqlite3.c] Error

(17) By Max (Maxulite) on 2025-01-20 19:38:57 in reply to 16 [link] [source]

Nevermind, fixed with truncating to {%Y-%m-%d %H:%M:%S}

As for the original question from Drh , the trunk is also falls into (slow without the patch/quick with the patch);

(18) By Stephan Beal (stephan) on 2025-01-20 20:55:47 in reply to 17 [link] [source]

somehow it helped to make things better with your variant slightly modified

Great, thank you. That's now patched in the trunk and 3.48 branch.

Nevermind, fixed with truncating to {%Y-%m-%d %H:%M:%S}

Can you tell us which tclsh version is choking on that part?

i'm unable to reproduce this with either jimtcl (which comes embedded in the source tree), tcl8.6, 8.7, or 9.0, but don't have 8.5 handy to test with.

The valuie -129 is, according to the code, coming from your system clock, and is the final 3 minutes of 1969.

(19) By Max (Maxulite) on 2025-01-20 21:54:52 in reply to 18 [link] [source]

Can you tell us which tclsh version is choking on that part?

$ echo 'puts $tcl_version;exit 0' | tclsh
8.6

$ pacman -Ss mingw-w64-i686-tcl
mingw32/mingw-w64-i686-tcl 8.6.12-2 [installed]

I suppose it's not the oldest (8.6), but the updated (8.6.12-2). I don't know whether $tcl_version should full version number.

(21) By Max (Maxulite) on 2025-01-21 08:07:55 in reply to 11 [link] [source]

Other probable explanation for the mismatch between our observations is in your comment about other issue when SQLITE_DEBUG constant affects the execution time. Maybe it's also the case here.

I looked at the sources and very comments around the routines in question. Though I don't have technological suggestions (since the algorithms and logic are complex enough to require more time to explore), I have some sidenotes

Looking at the data of the OP and my own internal guessing, there's a psychological phenomenon. I'm sure it was discussed many times, but it is worth mentioning here I think. It's when you basically have transactional table/lookup tables and for them it's natural to use left joins because most of the time some of the data might be missing and it's allowed. But when you're sure that you don't allow missing information and your transactional table has no NULLs for data, inner joins become equivalent to left joins. But for the engine and the planner they're two different worlds. So the Select * from TripDetailsRaw looks just like this case (if we look into the body of the view), it returns the same 4048 rows both when inner join or left join is used. I may be wrong, but OP might as well use left joins in all similar cases in his database with the same outcome. This observation makes me think that it's worth improving computeMxChoice only if other kinds of queries show regressions because I'm not sure that using inner joins for transactional table/lookup tables is a good choice. One can be sure that no NULL lookup references exist, but it's not forever. 

Another is the question. I know that testing for performance (and regressions related to it) is challenging. Just the recent weeks had several posts about unexpected performance regressions. In a naive case it means keeping large datasets and wasting execution times. But is it possible for a test to have a schema and explainer (like in EXPLAIN QUERY PLAN) that can produce calculable execution estimations when supplied with table sizes and this test keeps some sets of different table sizes. When a new optimization is added, an automatic routine might compare the estimations of previous versions and the new and warn if the new ones for some of the table size sets are significantly higher. No need to keep datasets and execute against them. Does SQLite have something like this?

(20) By Balaji Ramanathan (balaji) on 2025-01-20 23:07:20 in reply to 6 [link] [source]

I have been using sqlite for the last at least 5 years and I have not had any speed issues in previous versions.

(7) By CMDR Vulkarius (CMDR_Vulkarius) on 2025-01-19 21:28:03 in reply to 1.1 [link] [source]

I had a similar problem recently here -
https://sqlite.org/forum/forumpost/0f527f061f

For me, the problem was that the table-valued JSON functions were being demoted below indexed tables in the query planner in SQLite > 3.23.3.
At least as a diagnostic, perhaps try EXPLAIN QUERY PLAN in each version, check the join order, and try using CROSS JOIN to force the recursion order.

If indexing/analysis doesn't fix it but CROSS JOIN does, that's a regression according to the next-gen query planner documentation -
https://sqlite.org/queryplanner-ng.html#howtofix