SQLite Forum

A thought on LIKE/GLOB optimization

A thought on LIKE/GLOB optimization

(1) By anonymous on 2021-08-26 18:45:45 [link] [source]

I'm dealing with data that mostly is made up of digits, but needs to be
treated as character strings.  So my columns are mostly declared as

But when I try to do something like "WHERE foo LIKE '1234%'", these
combine to break the conditions for the LIKE optimization, because the
collating sequence defaults to BINARY, so I have to do things like
WHERE foo BETWEEN '1234' AND '1234~' or, more correctly for general
data, WHERE foo >= '1234' AND foo < '1235'.  (In passing, would it be
sane to suggest "BETWEEN '1234' INCLUSIVE AND '1235' EXCLUSIVE" or
something of the sort?)

Declaring the relevant column(s) as TEXT COLLATE NOCASE does indeed get
me the optimizations, but it occurs to me that the optimization could
be run anyway for something like LIKE '1234%', because 1, 2, 3, and 4
sort in the same positions relative to all other characters under both
BINARY and NOCASE.  It looks to me as though this is true for all
characters except [ \ ] ^ _ ` (the characters between Z and a).  So
would it be unreasonable to suggest extending that optimization to
BINARY collating columns if the pattern being matched against doesn't
contain any of those characters (or, to get even fancier, by truncating
it for index-search purposes just before the first such character)?

...or am I missing something?

(2) By Igor Tandetnik (itandetnik) on 2021-08-29 15:29:21 in reply to 1 [link] [source]

"This is true for all characters except..." is incorrect. 'a' sorts after 'Z' in BINARY, but between 'A' and 'B' in NOCASE.

(3.1) By Keith Medcalf (kmedcalf) on 2021-08-30 00:00:46 edited from 3.0 in reply to 2 [source]

That is untrue.

'A' sorts before 'a' in BINARY.
'A' sorts equal 'a' in NOCASE.
Mutatis mutandis all the single-byte latin alphabetic characters [A-Z] and [a-z].

While it is technically true that 'a' collate nocase between 'A' and 'B' is indeed true, so is 'A' collate nocase between 'a' and 'b'.

For further clarity and to avoid boundary conditions (because I am sure that someone somewhere will claim that 'oh, thats because they are equal') it should be noted that both of the following are also true:

'M' collate nocase between 'a' and 'z'
'm' collate nocase between 'A' and 'Z'

(4) By J.M. Aranda (JMAranda) on 2021-08-30 18:14:18 in reply to 3.1 [link] [source]

Clarification: the expression "Mutatis Mutandis" which belongs to classical Latin, is equivalent to current "up to isomorphism".

(5) By Keith Medcalf (kmedcalf) on 2021-08-30 18:44:14 in reply to 4 [link] [source]

Mutatis Mutandis generally means "making the same/necessary change".