fts5 and diacritics
(1) By anonymous on 2021-09-30 21:33:12 [link] [source]
I'm using FTS5 w/ sqlite version 3.35. And am trying to match search for 'O' character w/ the expectation that it will prefix match any characters w/ diacritics. This works for many of the common diacritics but in order to find 'Ø', I have to specifically search for 'Ø'.
INSERT into test(DESCRIPTION, TEXT_TO_MATCH) values ("regular", "Oz"); INSERT into actor(ACTOR_UUID, DISPLAY_NAME) values ("circumflex", "Ôz"); INSERT into test(DESCRIPTION, TEXT_TO_MATCH) values ("stroke", "Øz");
CREATE virtual table ftsTest using fts5(DESCRIPTION, TEXT_TO_MATCH, content=test, tokenize = "unicode61 remove_diacritics 2");
INSERT INTO ftsTest (ftsTest) VALUES('rebuild');
select * from ftsTest where TEXT_TO_MATCH match '"Oz"'; -- doesn't match "Øz"; matches "Oz" and "Ôz" select * from ftsTest where TEXT_TO_MATCH match '"Øz"'; -- matches "Øz";
Many of the other diacritics for O seem to match other diacritics correctly. Do I have fts table misconfigured? Is this a bug? Maybe I misunderstand how diacritics work? Any help would be appreciated
(2) By anonymous on 2021-10-01 07:34:00 in reply to 1 [source]
Unicode doesn't define decompositions for the letters ø and Ø (nor for the diameter sign ⌀).