Issue with with snippet() and highlight() with fts5 in trigram tokenization mode
(1) By anonymous on 2023-10-23 13:05:29 [source]
Using an example from the FTS5 docs
-- Assuming this:
CREATE VIRTUAL TABLE ft USING fts5(a);
INSERT INTO ft VALUES('a b c x c d e');
INSERT INTO ft VALUES('a b c c d e');
INSERT INTO ft VALUES('a b c d e');
-- The following SELECT statement returns these three rows:
-- '[a b c] x [c d e]'
-- '[a b c] [c d e]'
-- '[a b c d e]'
SELECT highlight(ft, 0, '[', ']') FROM ft WHERE ft MATCH 'a+b+c AND c+d+e';
we get the expected result:
[a b c] x [c d e]
[a b c] [c d e]
[a b c d e]
If we instead use the trigram tokenizer with a similar example:
CREATE VIRTUAL TABLE ft2 USING fts5(a, tokenize="trigram");
INSERT INTO ft2 VALUES('abc x cde');
INSERT INTO ft2 VALUES('abc cde');
INSERT INTO ft2 VALUES('abcde');
SELECT highlight(ft2, 0, '[', ']') FROM ft2 WHERE ft2 MATCH 'abc AND cde';
Now we get an issue where the matches overlap, i.e. they are not combined into a single range:
[abc] x [cde]
[abc] [cde]
[abc]de[cde]
Is this a known issue? Does anyone know of any workarounds?
(2) By Dan Kennedy (dan) on 2023-10-24 16:08:21 in reply to 1 [link] [source]
Thanks for reporting this. It was a bug of course. Should now be fixed here:
https://sqlite.org/src/info/e952db86faaafd2e
Dan.
(3) By anonymous on 2023-10-25 10:57:36 in reply to 2 [link] [source]
Thanks Dan, much appreciated!