Query results change after adding index to UTF-16be database
(1) By Rashad Sookram (rsookram) on 2025-01-31 03:30:31 [source]
PRAGMA encoding = 'UTF-16be';
CREATE TABLE Example(word TEXT NOT NULL);
CREATE INDEX Example_word on Example(word);
INSERT INTO Example VALUES('み');
SELECT * FROM Example WHERE word GLOB 'み*';
In the example above, no rows were returned for the query, but I expected the 'み' row to be returned. I thought that the query would be equivalent to the following:
SELECT * FROM Example WHERE word >= 'み' AND word < 'む';
This query returned the row as expected however, so it looks like something went wrong for the GLOB
query. This problem only happens when using UTF-16be and having an index on the column being searched. If either a different encoding is used for the database, or the index isn't added, the GLOB
works as expected.
Seen in SQLite version 3.46.1 on Linux. I also tried on a recent commit and it was the same there.
(2) By Bo Lindbergh (_blgl_) on 2025-01-31 12:44:09 in reply to 1 [link] [source]
Here's the bug: SQLite generates the upper bound of the range by incrementing the last byte of the UTF-8 representation of the prefix. This generates garbage when that last byte is x'BF'
, but the garbage still works as long as the database uses UTF-8.
'み'
in UTF-8 is x'E381BF'
.
(3) By Dan Kennedy (dan) on 2025-01-31 15:56:28 in reply to 2 [link] [source]
(4) By Rashad Sookram (rsookram) on 2025-01-31 16:08:57 in reply to 3 [link] [source]
Thanks! I built the latest code and it works for me too.