SQLite Forum

A thought on LIKE/GLOB optimization
Login
I'm dealing with data that mostly is made up of digits, but needs to be
treated as character strings.  So my columns are mostly declared as
TEXT.

But when I try to do something like "WHERE foo LIKE '1234%'", these
combine to break the conditions for the LIKE optimization, because the
collating sequence defaults to BINARY, so I have to do things like
WHERE foo BETWEEN '1234' AND '1234~' or, more correctly for general
data, WHERE foo >= '1234' AND foo < '1235'.  (In passing, would it be
sane to suggest "BETWEEN '1234' INCLUSIVE AND '1235' EXCLUSIVE" or
something of the sort?)

Declaring the relevant column(s) as TEXT COLLATE NOCASE does indeed get
me the optimizations, but it occurs to me that the optimization could
be run anyway for something like LIKE '1234%', because 1, 2, 3, and 4
sort in the same positions relative to all other characters under both
BINARY and NOCASE.  It looks to me as though this is true for all
characters except [ \ ] ^ _ ` (the characters between Z and a).  So
would it be unreasonable to suggest extending that optimization to
BINARY collating columns if the pattern being matched against doesn't
contain any of those characters (or, to get even fancier, by truncating
it for index-search purposes just before the first such character)?

...or am I missing something?