SQLite User Forum

Issue with with snippet() and highlight() with fts5 in trigram tokenization mode
Login

Issue with with snippet() and highlight() with fts5 in trigram tokenization mode

(1) By anonymous on 2023-10-23 13:05:29 [source]

Using an example from the FTS5 docs

-- Assuming this:
CREATE VIRTUAL TABLE ft USING fts5(a);
INSERT INTO ft VALUES('a b c x c d e');
INSERT INTO ft VALUES('a b c c d e');
INSERT INTO ft VALUES('a b c d e');

-- The following SELECT statement returns these three rows:
--   '[a b c] x [c d e]'
--   '[a b c] [c d e]'
--   '[a b c d e]'
SELECT highlight(ft, 0, '[', ']') FROM ft WHERE ft MATCH 'a+b+c AND c+d+e';

we get the expected result:

[a b c] x [c d e]
[a b c] [c d e]
[a b c d e]

If we instead use the trigram tokenizer with a similar example:

CREATE VIRTUAL TABLE ft2 USING fts5(a, tokenize="trigram");
INSERT INTO ft2 VALUES('abc x cde');
INSERT INTO ft2 VALUES('abc cde');
INSERT INTO ft2 VALUES('abcde');
SELECT highlight(ft2, 0, '[', ']') FROM ft2 WHERE ft2 MATCH 'abc AND cde';

Now we get an issue where the matches overlap, i.e. they are not combined into a single range:

[abc] x [cde]
[abc] [cde]
[abc]de[cde]

Is this a known issue? Does anyone know of any workarounds?

(2) By Dan Kennedy (dan) on 2023-10-24 16:08:21 in reply to 1 [link] [source]

Thanks for reporting this. It was a bug of course. Should now be fixed here:

https://sqlite.org/src/info/e952db86faaafd2e

Dan.

(3) By anonymous on 2023-10-25 10:57:36 in reply to 2 [link] [source]

Thanks Dan, much appreciated!