Overriding collation of IN expressions
(1.1) Originally by Mark Lawrence (mark) with edits by Richard Hipp (drh) on 2020-05-20 15:59:55 from 1.0 [source]
The documentation at  on collation says:
The rules for determining which collating function to use for a binary comparison operator (=, <, >, <=, >=, !=, IS, and IS NOT) are as follows: 1. If either operand has an explicit collating function assignment using the postfix COLLATE operator, then the explicit collating function is used for comparison, with precedence to the collating function of the left operand.
It also mentions the following:
The expression "x IN (SELECT y ...)" is handled in the same way as the expression "x = y" for the purposes of determining the collating sequence. The collating sequence used for expressions of the form "x IN (y, z, ...)" is the collating sequence of x.
Given the above I would have expected that SELECT with an IN expression to behave the same as a SELECT with an '=' expression when COLLATE NOCASE is added to either one. But that does not seem to be the case:
-- 3.32.0 2020-04-18 14:12:00 d5b0def96ba6d90f47bc96fab1ccf9c501d84885d086744035b16fd96f3e248c CREATE TABLE a(b TEXT); INSERT INTO a VALUES ('one'), ('ONE'); SELECT * FROM a WHERE b = 'one' COLLATE nocase; -- b -- ---------- -- one -- ONE SELECT * FROM a WHERE b IN ('one') COLLATE nocase; -- b -- ---------- -- one
This appear to be an oversight in (hopefully) the implementation or the documentation.
In the syntax diagram for expressions, we see that the COLLATE operator is applied to an "expr". The COLLATE operator has high precedence, so it binds more tightly that most everything else. For the phrase:
b IN ('one') COLLATE nocase
('one')" part is not an expression. That subphrase is part of the
IN expression. Hence, the expression must be parsed thusly:
(b IN ('one')) COLLATE nocase
So, in other words, the COLLATE operator is being applied to the result of the IN operator, not to the operands of the IN operator.
Contrast the above with the equality constraint:
b == 'one' COLLATE nocase
Here, the "
'one'" part is an expression, and so the COLLATE operator
is able to bind to it. And since COLLATE has higher precedence than ==,
we get the following parse:
b == ('one' COLLATE nocase);
So here, the COLLATE operator applies to the right operand of the == operator and thus affects the operation of the == operator.
If you want to apply a collation to an IN operator, you need to put the COLLATE operator on the left operand, which is an expression. Hence, you probably want this:
b COLLATE nocase IN ('one')
Which parses out as:
(b COLLATE nocase) IN ('one')
Thanks Richard. That makes it clear why it works (or doesn't) the way it does.
New syntax I learned today:
sqlite> select 'b' collate nocase = 'B'; 'b' collate nocase = 'B' ------------------------ 1
This positioning of collate is new to me... Can I suggest an additional sentence in datatype3.html:
The collating sequence used for expressions of the form "x IN (y, z, ...)" is the collating sequence of x. If an explicit collating sequence is required it should be applied to the left hand side of the expression: "x COLLATE nocase IN (y,z, ...)"