SQLite Forum

Regression in snippet() function in 3.44.0
Login

Regression in snippet() function in 3.44.0

(1) By Kovid Goyal (kovidgoyal) on 2023-11-03 08:19:10 [link] [source]

The snippet function is marking incorrect highlight extents in sqlite 3.44.0 Sample sqlite script

CREATE VIRTUAL TABLE fts_table USING fts5(t, tokenize = 'unicode61 remove_diacritics 2');
CREATE VIRTUAL TABLE fts_row USING fts5vocab(fts_table, row);
INSERT INTO fts_table(t) VALUES ('你dont叫mess');
SELECT term,doc FROM fts_row;
SELECT snippet(fts_table, 0, '>', '<', '...', 4) FROM fts_table WHERE fts_table MATCH '叫';
Output with sqlite < 3.44.0
dont|1
mess|1
你|1
叫|1
你dont>叫<mess
Output with sqlite 3.44.0
dont|1
mess|1
你|1
叫|1
你dont>叫mess<

Notice the trailing < is in the wrong position. Note that this script is not sufficient to reproduce on its own as it uses a custom tokenizer (unicode61 here is overriden in a custom sqlite extension). However, the output indicates tokenization is correct in both versions, so the issue must be in the snippet function.

The code of the tokenizer is here: https://github.com/kovidgoyal/calibre/blob/master/src/calibre/db/sqlite_extension.cpp

However its not standalone and depends on ICU and snowball stemmer etc. But since the tokenization is correct it shouldnt matter.

If there is some more information I can provide, please ask.

(2) By Dan Kennedy (dan) on 2023-11-03 17:21:28 in reply to 1 [link] [source]

Thanks for reporting this. Does it work after this change?

https://sqlite.org/src/info/8f5e9c192ff2820d

Thanks,

Dan.

(3) By Kovid Goyal (kovidgoyal) on 2023-11-04 06:27:53 in reply to 2 [source]

Yes, it does, thanks.

(4) By Kovid Goyal (kovidgoyal) on 2023-11-24 07:24:21 in reply to 2 [link] [source]

This bug is still present in 3.44.1 and I dont see a mention of it being fixed in the release notes.

(5) By Dan Kennedy (dan) on 2023-11-24 11:09:44 in reply to 4 [link] [source]

I don't think that one made the patch releases. It will be in 3.45.0 though.

Dan.