2020-03-05
| ||
16:26 | • Fixed ticket [1b8d7264]: MAX yields unexpected result for UTF-16 plus 6 other changes (artifact: bf4e521e user: drh) | |
16:13 | Change the sqlite3.pDfltColl (the default collating sequence for the database connection) so that it is the collating sequence appropriate for the database encoding, not the UTF8 collating sequence. This helps to ensure that the database encoding collation is always used, even for expressions that do not have an defined collating sequence. Ticket [1b8d7264567eb6fc]. (check-in: 4a585189 user: drh tags: trunk) | |
2020-03-04
| ||
22:56 | • New ticket [1b8d7264] MAX yields unexpected result for UTF-16. (artifact: f1ec275c user: mrigger) | |
Ticket Hash: | 1b8d7264567eb6fc9a6149757a94d58e365ee704 | |||
Title: | MAX yields unexpected result for UTF-16 | |||
Status: | Fixed | Type: | Code_Defect | |
Severity: | Minor | Priority: | Low | |
Subsystem: | Unknown | Resolution: | Fixed | |
Last Modified: | 2020-03-05 16:26:30 | |||
Version Found In: | 3.32.0 | |||
User Comments: | ||||
mrigger added on 2020-03-04 22:56:29:
Consider the following statements: PRAGMA encoding = 'UTF-16'; CREATE TABLE t0(c0 TEXT); INSERT INTO t0(c0) VALUES ('윆'), (1); SELECT MAX(CASE 1 WHEN 1 THEN t0.c0 END) FROM t0; -- 윆 SELECT MAX(t0.c0) FROM t0; -- 1 I would expect both queries to produce the same result, however, the first returns '윆', while the second returns 1. When using UTF-8, both queries return '윆'. drh added on 2020-03-05 16:26:30: The default sort order for text is whatever order the text is rendered in using memcmp() as a comparison function. That order is different depending on the text encoding. UTF-8 is mostly the same as UTF-16BE, except for a few corner cases involving surrogate pairs. But UTF-16LE is quite different. (This surprising fact about text sort order needs to be discussed on the "quirks" documentation page.) The fact that the sort order of text is different depending on the text encoding is less of a problem than it might seem at first because:
The problem reported by this ticket arose because SQLite was using a UTF8 collating sequence for some, but not all, expressions in a database that had a UTF16LE text encoding. The fix was to ensure that the same collating sequence is used for *all* comparisons within the same database. This ticket is labeled "minor" in severity because of the observation that it impacts hardly any databases in the wild. |