SQLite Forum

Feature request: add index info to WITH clause table
Login

Feature request: add index info to WITH clause table

(1.2) By MBL (RoboManni) on 2021-07-15 11:30:39 edited from 1.1 [link] [source]

I have some use cases where I read several lookup tables from text files. I do this in WITH clauses where I do filtering and apply formatting for further use steps.

Then in the main query I want to left outer join these tables (based on time stamps and time ranges as the only link of data - loose relationship).

Because my table preparation within the WITH clause is reading text files using one of my virtual table extension, there is no support by indexes; they do not exist.

If I understand it correctly from what I read in all the documentation about SQLite3 over the time then each of the left outer join steps have to scan the pre-prepared tables and cannot (yet) simply lookup an index for faster access.

That's why I suggest to think about an extension of the WITH clause by some syntax to give some additional index information.

Of cause I could use temporary tables and temporary indexes - but I do not want to fill any database (not even a temporary) when it is not necessary - the source of information can remain in their text files. Only my (very complex) query is contained in the SQLite3 database and reading from a subfolder relative allows me to simply copy the database to somewhere else to run the same query with different data without any changes to it.

To give some hints about my use case: I am using SQLite3 version 3.36.0 with several of my extension functions and virtual tables. Windows OS. The result table has 29000 rows and the total query time took 544 seconds to get them while looking up just one table with contained 29000 records. The main table without filtering consists of 255000 lines out of 2 text files with a total of 27MB size (there could be many more of them and also the size could be much bigger). My database which contains the complex query has just 98kB size.

The Query Plan is short but seems to be not as efficient as it could become when supported by an index:

id  parent  notused  detail
3   0       0        MATERIALIZE ViewStateMachineState
6   3       0        SCAN AllStateMachineText VIRTUAL TABLE INDEX 0:
25  0       0        SCAN AllPLCcommLog       VIRTUAL TABLE INDEX 0:
47  0       0        SCAN L

How can accessing of WITH clause generated volatile tables be supported with indexes? At least with one primary index would be great!

The Virtual table could probably help but at the moment of reading the text from file there was not filtering on any extracted data happening yet. And as one and the same virtual table file reading process can be used in independent views it is also not neccessary to generate always all possible index info when the use in the with clause requires just one to support performance.

My proposal: as the WITH clause allows repetition with a comma the separated index could be constructed like a select is done. All is kept temporary and only as much as the whole statement requires and as long as the statement connection lives. That would really be a great push and improvement to many of my use cases, where the complex statement is used only once - to fill the result table for further processing and presentations.

with 
LookupTable( TimeStamp, LookupContent ) as MATERIALIZE (
   select substr(Content,1,24)        as TimeStamp
        , trim(substr(Content,26,40)) as LookupContent 
     from VirtualLookupTextTable
    where instr(Content,'keyword')
),
LookupIndex as (
   create primary key( TimeStamp ) on LookupTable
)
select FromTimeStamp, ToTimeStamp, M.Content, L.Content
  from VirtualMainTextTable   M
  left outer join LookupTable L
               on M.ToTimeStamp>M.FromTimeStamp and L.TimeStamp > M.FromTimeStamp and L.TimeStamp <= M.ToTimeStamp
                                                 or L.TimeStamp == M.ToTimeStamp

EDIT: typo detected and corrected and added content from the lookup table to the final result

(2) By David Raymond (dvdraymond) on 2021-07-15 12:56:38 in reply to 1.2 [link] [source]

I know SQLite has logic where it can create Automatic Indexes to help with queries. Perhaps if that was combined with AS MATERIALIZED hints to let it know it's ok to consider making an index on the materialization of the with statement?

Or is that already the case?

(3.1) By mzm2021 on 2021-07-15 14:22:09 edited from 3.0 in reply to 1.2 [source]

Short version

I'm with you on adding INDEXes to CTE but I hate your syntax proposal. :)

Long version

I feel your pain

Actually, I've written a lot of large queries where having such a feature would have been a time saver. It would have also prevented me from rewriting a "clean" CTE-query into a complex write-only equivalent (I was unable to understand it after a few hours even with extensive comments.)

No prior art

There is no other SQL implementation that has such a feature (or at least no implementation that I am aware of.)

So the syntax has to be "invented". My reasoning is that it should also fit with the SQL way of expressing things: I mean it has to avoid mixing a heavy DDL syntax (CREATE INDEX) in a relatively pure query syntax (the CTE.)

The proposal

My suggestion is twofold:

CREATE UNIQUE INDEX IF NOT EXISTS schema-name . index-name ON table-name ( indexed-column ) , WHERE expr
     linerad = 10px
     linewid *= 0.5
     $h = 0.21

     circle radius 10%
A0:  arrow 2*arrowht
CR:  oval "CREATE" fit
UQ:  oval "UNIQUE" fit with .w at (linewid right of CR.e,.8*$h below CR)
IX:  oval "INDEX" fit with .w at (linewid right of UQ.e,CR)
     arrow from CR.e right even with UQ; arrow to IX.w
     arrow from CR.e right linerad then down even with UQ then to UQ.w
     line from UQ.e right linerad then up even with IX then to arrowht left of IX.w
     arrow from IX.e right
     oval "IF" fit
     arrow right 2*arrowht
     oval "NOT" fit
     arrow 2*arrowht
ETS: oval "EXISTS" fit

     # IF NOT EXISTS bypass
Y1:  .5*$h below UQ.s  # vertical position of back-arrow
     arrow from IX.e right linerad then down even with Y1  then left even with UQ
     arrow from ETS.e right then down even with Y1 then left even with ETS.w
     line left even with IX.w

     # second row
     arrow left even with first circle then down $h*1.25 then right 2*arrowht
SN:  oval "schema-name" fit
     arrow 2*arrowht
DOT: oval "." bold fit
     arrow
     oval "index-name" fit
     arrow 2*arrowht
     oval "ON" fit
     arrow 2*arrowht
     oval "table-name" fit
     arrow 2*arrowht

LP:  oval "(" bold fill red fit
     arrow
IXC: box "indexed-column" fill red fit
     arrow
RP:  oval ")"  bold fit    fill red     

     # Loop over indexed-column
CMA: oval "," bold fit at $h*1.25 below IXC fill red
     arrow from IXC.e right linerad then down even with CMA then to CMA.e
     line from CMA.w left even with 2*arrowht left of IXC.w \
       then up even with IXC then right linerad

     # schema-name bypass
     arrow from (first circle,SN.n) down even with $h below SN \
       then right even with SN
     line right even with arrowht right of DOT.e then up even with DOT \
        then right linerad

     # WHERE clause and end circle
WH:  oval "WHERE" fit with .w at IXC.w+(linewid,-$h*3.5) fill red
     arrow right 2*arrowht
WEX: box "expr" fit fill red
     arrow right even with linewid*1.5 right of RP.e
     circle same fill red

     # Linkage from the last ")" to the WHERE clause or or circle
     arrow from RP.e right linerad then down even with CMA
J1:  previous.end
     line down even with last circle then to arrowht left of last circle.w
     arrow from J1 down even with $h below CMA then left even with CMA
     arrow left even with 2*arrowht left of WH.w then down even with WH \
       then to WH.w
  • Extend CTE declaration to allow table-contraints to be used inside the column names declaration (just as what one can already do in CREATE TABLE).

Your example using my suggested syntax

WITH 
LookupTable( TimeStamp, LookupContent, PRIMARY KEY( TimeStamp ) ) as MATERIALIZE (
   select substr(Content,1,24)        as TimeStamp
        , trim(substr(Content,26,40)) as LookupContent 
     from VirtualLookupTextTable
    where instr(Content,'keyword')
)
select FromTimeStamp, ToTimeStamp, M.Content, L.Content
  from VirtualMainTextTable   M
  left outer join LookupTable L
               on M.ToTimeStamp>M.FromTimeStamp and L.TimeStamp > M.FromTimeStamp and L.TimeStamp <= M.ToTimeStamp
                                                 or L.TimeStamp == M.ToTimeStamp

Another example using INDEX inside the columns definition of the CTE

WITH 
LookupTable( a, b, INDEX (a, b), INDEX (b) WHERE a < 10 ) as MATERIALIZE (...)
SELECT ...

(4) By MBL (RoboManni) on 2021-07-15 15:40:25 in reply to 3.1 [link] [source]

YES, I like your proposal to take the CTE table declaration similar to a "CREATE TABLE" statement to include the index declarations.