A thought on LIKE/GLOB optimization
(1) By anonymous on 2021-08-26 18:45:45 [link] [source]
I'm dealing with data that mostly is made up of digits, but needs to be treated as character strings. So my columns are mostly declared as TEXT. But when I try to do something like "WHERE foo LIKE '1234%'", these combine to break the conditions for the LIKE optimization, because the collating sequence defaults to BINARY, so I have to do things like WHERE foo BETWEEN '1234' AND '1234~' or, more correctly for general data, WHERE foo >= '1234' AND foo < '1235'. (In passing, would it be sane to suggest "BETWEEN '1234' INCLUSIVE AND '1235' EXCLUSIVE" or something of the sort?) Declaring the relevant column(s) as TEXT COLLATE NOCASE does indeed get me the optimizations, but it occurs to me that the optimization could be run anyway for something like LIKE '1234%', because 1, 2, 3, and 4 sort in the same positions relative to all other characters under both BINARY and NOCASE. It looks to me as though this is true for all characters except [ \ ] ^ _ ` (the characters between Z and a). So would it be unreasonable to suggest extending that optimization to BINARY collating columns if the pattern being matched against doesn't contain any of those characters (or, to get even fancier, by truncating it for index-search purposes just before the first such character)? ...or am I missing something?
(2) By Igor Tandetnik (itandetnik) on 2021-08-29 15:29:21 in reply to 1 [link] [source]
"This is true for all characters except..." is incorrect.
'a' sorts after
BINARY, but between
(3.1) By Keith Medcalf (kmedcalf) on 2021-08-30 00:00:46 edited from 3.0 in reply to 2 [link] [source]
That is untrue.
'A' sorts before 'a' in BINARY.
'A' sorts equal 'a' in NOCASE.
Mutatis mutandis all the single-byte latin alphabetic characters [A-Z] and [a-z].
While it is technically true that
'a' collate nocase between 'A' and 'B' is indeed true, so is
'A' collate nocase between 'a' and 'b'.
For further clarity and to avoid boundary conditions (because I am sure that someone somewhere will claim that 'oh, thats because they are equal') it should be noted that both of the following are also true:
'M' collate nocase between 'a' and 'z'
'm' collate nocase between 'A' and 'Z'
(4) By J.M. Aranda (JMAranda) on 2021-08-30 18:14:18 in reply to 3.1 [link] [source]
Clarification: the expression "Mutatis Mutandis" which belongs to classical Latin, is equivalent to current "up to isomorphism".
(5) By Keith Medcalf (kmedcalf) on 2021-08-30 18:44:14 in reply to 4 [source]
Mutatis Mutandis generally means "making the same/necessary change".