Feature request: add index info to WITH clause table
I have some use cases where I read several lookup tables from text files. I do this in WITH clauses where I do filtering and apply formatting for further use steps.
Then in the main query I want to left outer join these tables (based on time stamps and time ranges as the only link of data - loose relationship).
Because my table preparation within the WITH clause is reading text files using one of my virtual table extension, there is no support by indexes; they do not exist.
If I understand it correctly from what I read in all the documentation about SQLite3 over the time then each of the left outer join steps have to scan the pre-prepared tables and cannot (yet) simply lookup an index for faster access.
That's why I suggest to think about an extension of the WITH clause by some syntax to give some additional index information.
Of cause I could use temporary tables and temporary indexes - but I do not want to fill any database (not even a temporary) when it is not necessary - the source of information can remain in their text files. Only my (very complex) query is contained in the SQLite3 database and reading from a subfolder relative allows me to simply copy the database to somewhere else to run the same query with different data without any changes to it.
To give some hints about my use case: I am using SQLite3 version 3.36.0 with several of my extension functions and virtual tables. Windows OS. The result table has 29000 rows and the total query time took 544 seconds to get them while looking up just one table with contained 29000 records. The main table without filtering consists of 255000 lines out of 2 text files with a total of 27MB size (there could be many more of them and also the size could be much bigger). My database which contains the complex query has just 98kB size.
The Query Plan is short but seems to be not as efficient as it could become when supported by an index:
id parent notused detail 3 0 0 MATERIALIZE ViewStateMachineState 6 3 0 SCAN AllStateMachineText VIRTUAL TABLE INDEX 0: 25 0 0 SCAN AllPLCcommLog VIRTUAL TABLE INDEX 0: 47 0 0 SCAN L
How can accessing of WITH clause generated volatile tables be supported with indexes? At least with one primary index would be great!
The Virtual table could probably help but at the moment of reading the text from file there was not filtering on any extracted data happening yet. And as one and the same virtual table file reading process can be used in independent views it is also not neccessary to generate always all possible index info when the use in the with clause requires just one to support performance.
My proposal: as the WITH clause allows repetition with a comma the separated index could be constructed like a select is done. All is kept temporary and only as much as the whole statement requires and as long as the statement connection lives. That would really be a great push and improvement to many of my use cases, where the complex statement is used only once - to fill the result table for further processing and presentations.
with LookupTable( TimeStamp, LookupContent ) as MATERIALIZE ( select substr(Content,1,24) as TimeStamp , trim(substr(Content,26,40)) as LookupContent from VirtualLookupTextTable where instr(Content,'keyword') ), LookupIndex as ( create primary key( TimeStamp ) on LookupTable ) select FromTimeStamp, ToTimeStamp, M.Content, L.Content from VirtualMainTextTable M left outer join LookupTable L on M.ToTimeStamp>M.FromTimeStamp and L.TimeStamp > M.FromTimeStamp and L.TimeStamp <= M.ToTimeStamp or L.TimeStamp == M.ToTimeStamp
EDIT: typo detected and corrected and added content from the lookup table to the final result
(2) By David Raymond (dvdraymond) on 2021-07-15 12:56:38 in reply to 1.2 [link] [source]
I'm with you on adding INDEXes to CTE but I hate your syntax proposal. :)
I feel your pain
Actually, I've written a lot of large queries where having such a feature would have been a time saver. It would have also prevented me from rewriting a "clean" CTE-query into a complex write-only equivalent (I was unable to understand it after a few hours even with extensive comments.)
No prior art
There is no other SQL implementation that has such a feature (or at least no implementation that I am aware of.)
So the syntax has to be "invented". My reasoning is that it should also fit with the SQL way of expressing things: I mean it has to avoid mixing a heavy DDL syntax (CREATE INDEX) in a relatively pure query syntax (the CTE.)
My suggestion is twofold:
table-contraintwith an INDEX branch. As a bonus, it will make creating INDEXes possible from inside CREATE TABLE. Note that it is already possible to do so in MySQL. The suggested extension will reuse parts I highlighted in the current CREATE INDEX syntax:
linerad = 10px linewid *= 0.5 $h = 0.21 circle radius 10% A0: arrow 2*arrowht CR: oval "CREATE" fit UQ: oval "UNIQUE" fit with .w at (linewid right of CR.e,.8*$h below CR) IX: oval "INDEX" fit with .w at (linewid right of UQ.e,CR) arrow from CR.e right even with UQ; arrow to IX.w arrow from CR.e right linerad then down even with UQ then to UQ.w line from UQ.e right linerad then up even with IX then to arrowht left of IX.w arrow from IX.e right oval "IF" fit arrow right 2*arrowht oval "NOT" fit arrow 2*arrowht ETS: oval "EXISTS" fit # IF NOT EXISTS bypass Y1: .5*$h below UQ.s # vertical position of back-arrow arrow from IX.e right linerad then down even with Y1 then left even with UQ arrow from ETS.e right then down even with Y1 then left even with ETS.w line left even with IX.w # second row arrow left even with first circle then down $h*1.25 then right 2*arrowht SN: oval "schema-name" fit arrow 2*arrowht DOT: oval "." bold fit arrow oval "index-name" fit arrow 2*arrowht oval "ON" fit arrow 2*arrowht oval "table-name" fit arrow 2*arrowht LP: oval "(" bold fill red fit arrow IXC: box "indexed-column" fill red fit arrow RP: oval ")" bold fit fill red # Loop over indexed-column CMA: oval "," bold fit at $h*1.25 below IXC fill red arrow from IXC.e right linerad then down even with CMA then to CMA.e line from CMA.w left even with 2*arrowht left of IXC.w \ then up even with IXC then right linerad # schema-name bypass arrow from (first circle,SN.n) down even with $h below SN \ then right even with SN line right even with arrowht right of DOT.e then up even with DOT \ then right linerad # WHERE clause and end circle WH: oval "WHERE" fit with .w at IXC.w+(linewid,-$h*3.5) fill red arrow right 2*arrowht WEX: box "expr" fit fill red arrow right even with linewid*1.5 right of RP.e circle same fill red # Linkage from the last ")" to the WHERE clause or or circle arrow from RP.e right linerad then down even with CMA J1: previous.end line down even with last circle then to arrowht left of last circle.w arrow from J1 down even with $h below CMA then left even with CMA arrow left even with 2*arrowht left of WH.w then down even with WH \ then to WH.w
- Extend CTE declaration to allow
table-contraints to be used inside the column names declaration (just as what one can already do in CREATE TABLE).
Your example using my suggested syntax
WITH LookupTable( TimeStamp, LookupContent, PRIMARY KEY( TimeStamp ) ) as MATERIALIZE ( select substr(Content,1,24) as TimeStamp , trim(substr(Content,26,40)) as LookupContent from VirtualLookupTextTable where instr(Content,'keyword') ) select FromTimeStamp, ToTimeStamp, M.Content, L.Content from VirtualMainTextTable M left outer join LookupTable L on M.ToTimeStamp>M.FromTimeStamp and L.TimeStamp > M.FromTimeStamp and L.TimeStamp <= M.ToTimeStamp or L.TimeStamp == M.ToTimeStamp
Another example using INDEX inside the columns definition of the CTE
WITH LookupTable( a, b, INDEX (a, b), INDEX (b) WHERE a < 10 ) as MATERIALIZE (...) SELECT ...