Correct Index Not Being Used
(1) By SeverKetor on 2022-10-08 16:41:08 [source]
I have a query which uses (among other things) EXISTS(SELECT 1 FROM secondary_table WHERE main_table.TextColumn LIKE '%'||secondary_table.String||'%')
. This is a bit slow so I try to filter rows out prior to doing that check. However, secondary_table
is fairly static, so I'm considering just creating an index on TextColumn LIKE '%String1%' OR TextColumn LIKE '%String2%' etc.
, generating the query when needed using secondary_table
, and recreating the index on the rare occasion secondary_table
is updated.
That part's all fine and/or dandy, but the problem comes when running the generated query. SQLite just refuses to use the index I made, unless I force it to by using INDEXED BY
, even after running ANALYZE
. Both the index and the test query are using the exact same copy-and-pasted list of LIKE
s, so it's not a case where a minor difference confuses SQLite. This happens even when I only do SELECT TextColumn FROM main_table WHERE TextColumn LIKE '%String1%' OR TextColumn LIKE '%String2%' etc.
as a query on its own. For reference, there are 13 strings that are being checked.
I also tested out using TextColumn REGEXP 'String1|String2|etc.'
in the query and index, and in addition to be slower, it also had the same problem of not using its index unless INDEXED BY
was used.
Is this a bug to be fixed, or a case where the documentation's warning against using INDEXED BY
to optimize queries should be ignored? Or is there some other way to convince SQLite to use the index?
(2.1) By Keith Medcalf (kmedcalf) on 2022-10-08 17:01:58 edited from 2.0 in reply to 1 [link] [source]
If you are looking to determine if string B is contained in string A, then the correct test is: INSTR(A, B) > 0
not A LIKE ('%' || B || '%')
The purpose of LIKE is entirely different. LIKE does 47 things, of which you only need 1, and that 1 thing that you are trying to accomplish is the one (and only) function of the INSTR function.
If you insist on using LIKE then you will probably have to deal with the other 46 things that LIKE does that you need to kaibosh. Good luck with that.
(3) By SeverKetor on 2022-10-08 17:13:17 in reply to 2.1 [link] [source]
Actually, I use 3 things FROM LIKE
. The other 2 being case-insensitivity (trivially possible if I used LOWER
before INSTR
though) and wildcards (I already use them, and will likely use more in the future). LIKE
is the most natural fit in this case, I'd say.
That said, I might still be able to use INSTR
and a slightly more complex setup (I think all I'll ever need from LIKE
is '%', and I can cope with INSTR(TextColumn, String1A) AND INSTR(TextColumn, String1B)
sometimes having a different result than `TextColumn LIKE '%String1A%String1B%').
(4) By SeverKetor on 2022-10-08 17:33:47 in reply to 3 [link] [source]
Well, scratch the INSTR
idea. When I redo the query and index to use INSTR
, SQLite still has the problem where it can't figure out it can use the index.
(5) By John Dennis (jdennis) on 2022-10-09 06:49:08 in reply to 4 [link] [source]
Just out of interest, what is the performance difference between using SQLite's chosen query plan, or that forced by the use of INDEXED BY?
(6) By SeverKetor on 2022-10-09 16:55:12 in reply to 5 [link] [source]
For just the sub-query, about a 9x improvement. For the original version of the overall query I was testing, about 4x. For a faster version I stumbled into while testing out stuff after posting, only about a 2x improvement.
(7) By Donal Fellows (dkfellows) on 2022-10-10 13:13:45 in reply to 1 [link] [source]
Is this a bug to be fixed, or a case where the documentation's warning against using INDEXED BY to optimize queries should be ignored?
I remember being told by DRH in a tutorial that LIKE
doesn't use the column's index if the pattern has a wildcard at the beginning because indices are sorted from the beginning (because sorting from the middle makes no sense). This would seem to be the issue here. (I also believe that REGEXP
never uses indices, as SQLite doesn't spend a lot of effort on analysing regular expressions.)
If you're doing many more of these queries than you are doing alterations to the set of queries (and given that the set of queries is actually very small), it might be worth creating another table that says for each row which of the queries it matches. That can be done ahead of time and indexed, and lookups against that should be very fast. Even keeping things up to date when the data in the main table changes should be pretty easy; it's only altering the set of special queries that will be expensive.
(8) By SeverKetor on 2022-10-11 05:17:07 in reply to 7 [link] [source]
The initial wildcard wasn't the problem here. What I've finally caught on to is the fact I was using the index slightly differently than normal: WHERE Col LIKE 'constant' OR Col LIKE 'constant2' etc
; where as normally you do WHERE IndexedCol='whatever'
. Once I switched to WHERE (Col LIKE 'constant' OR etc.)=1
, SQLite was immediately able to start using the index voluntarily.
I did a little more testing as well, and all function indexes I tried (INSTR with a constant, squaring a column, and a custom function) all had the same problem if I excluded an '=Whatever' part. Given this means I was essentially trying to use an index for a <expr>!=0
term, which also doesn't work, then it makes sense.
I wonder if it would be possible for SQLite to be changed to make it work without having to use (<expr> OR <expr> OR etc.)>0
for the simpler and (to me) more intuitive syntax. However, I fully expect that to not be as simple of a change as it sounds, and therefore not happen.
(9) By anonymous on 2022-10-11 11:08:34 in reply to 8 [link] [source]
Post SQL to reproduce your issue and solution and many more might spend time on it...
Philip
(10) By SeverKetor on 2022-10-11 14:42:31 in reply to 9 [link] [source]
Not only is this thread already resolved (barring the unlikely case where SQLite being enhanced in the way I described in my last post), I posted the snippet of what I believed to be the culprit in the first post and was ultimately proven correct.