SQLite Forum

fts5 and diacritics
Login
Hi,

I'm using FTS5 w/ sqlite version 3.35.  And am trying to match search for 'O' character w/ the expectation that it will prefix match any characters w/ diacritics.  This works for many of the common diacritics but in order to find 'Ø', I have to specifically search for 'Ø'.

Example:

INSERT into test(DESCRIPTION, TEXT_TO_MATCH) values ("regular", "Oz");
INSERT into actor(ACTOR_UUID, DISPLAY_NAME) values ("circumflex", "Ôz");
INSERT into test(DESCRIPTION, TEXT_TO_MATCH) values ("stroke", "Øz");

CREATE virtual table ftsTest using fts5(DESCRIPTION, TEXT_TO_MATCH, content=test, tokenize = "unicode61 remove_diacritics 2");

INSERT INTO ftsTest (ftsTest) VALUES('rebuild');

select * from ftsTest where TEXT_TO_MATCH match '"Oz"*';  -- doesn't match "Øz"; matches "Oz" and "Ôz"
select * from ftsTest where TEXT_TO_MATCH match '"Øz"*';  -- matches "Øz";

Many of the other diacritics for O seem to match other diacritics correctly.  Do I have fts table misconfigured?  Is this a bug?  Maybe I misunderstand how diacritics work?  Any help would be appreciated

Thanks ahead!