Retrieve match context with FTS5
(1) By David G.F. (davidgf) on 2020-04-08 21:43:34 [link] [source]
I'm trying to make this work as in the docs. I'm using a table defined as:
CREATE VIRTUAL TABLE IF NOT EXISTS doc_fts USING fts5(doctext, content_rowid='docid', detail=full)
And inserting records like:
INSERT INTO doc_fts(rowid, doctext) VALUES (?, ?);
The actual data is in another table, where the docid from this table matches the rowid from the other table. However if I want to get the context of the match:
SELECT highlight(doctext, 2, '<b>', '</b>') FROM doc_fts WHERE doctext MATCH 'test';
I get a "weird" error: Result: no such column: T.docid
Since I think this might be due to the content_rowid, I tried to come up with a smaller testcase:
CREATE VIRTUAL TABLE IF NOT EXISTS doc_fts USING fts5(doctext); INSERT INTO doc_fts (doctext) values ("hello world"); SELECT highlight(doctext, 2, '<b>', '</b>') FROM doc_fts WHERE doctext MATCH 'hello';
The result of the select is: Error: no such cursor: 0
Which leaves me puzzled :)
Any ideas? Thanks!
(2) By Dan Kennedy (dan) on 2020-04-09 11:23:51 in reply to 1 [source]
The content_rowid option doesn't make much sense without a content option to name the external table. The error message is a bit underwhelming, I agree.
I think the CREATE VIRTUAL TABLE statement should probably be rejected. Or perhaps the content_rowid option just ignored if there is no content option. Will look into that.
In the second case, the error is because the first argument to highlight() should be doc_fts, not doctext:
SELECT highlight(doc_fts, 2, '<b>', '</b>') FROM doc_fts WHERE doctext MATCH 'hello';
(3) By David G.F. (davidgf) on 2020-04-13 18:50:57 in reply to 2 [link] [source]
Hey thanks for your answer! I think I'm making more sense now. Can you elaborate on how you would do this correctly?
Should I just add content=other_table_name? Thanks!
(4) By David G.F. (davidgf) on 2020-04-13 19:13:05 in reply to 3 [link] [source]
Just to clarify, this is how (simplified) I tipially use my database: CREATE TABLE IF NOT EXISTS docs (docid integer primary key, title text, url text) WITHOUT ROWID; CREATE VIRTUAL TABLE IF NOT EXISTS doc_fts USING fts5(doctext, content=docs, content_rowid=docid, detail=full); INSERT INTO docs(docid, title, url) VALUES (1234, "title1", "url1"); INSERT INTO doc_fts(rowid, doctext) VALUES (1234, "hello there test foo bar"); SELECT title, url FROM doc_fts, docs WHERE doctext MATCH 'test' AND docs.docid == doc_fts.rowid; So I have a table with docs and a FTS table, for a given set of words I can fetch the docs I want (and their associated info). I'm trying to use highlight without much luck on it. There's something fundamental I'm not understanding on how an FTS5 table works behind the curtains, in particular how the content and content_rowid options work. I also don't understand the advantage of using such options, since I can always use rowid in the FTS table to point to an external table. Ideally I'd just like my SELECT query to also return the highlighted text, but it seems that I can't get it to work if I use content/content_rowid. If I remove the content table it all works: CREATE VIRTUAL TABLE IF NOT EXISTS doc_fts USING fts5(doctext); INSERT INTO doc_fts(rowid, doctext) VALUES (1234, "hello there test foo bar"); SELECT rowid, highlight(doc_fts, 0, "<b>", "</b>") FROM doc_fts WHERE doctext MATCH 'test'; Many thanks!
(5.1) Originally by Dan Kennedy (dan) with edits by Richard Hipp (drh) on 2020-04-15 11:26:18 from 5.0 in reply to 4 [link] [source]
I don't think you want to use those options.
FTS5's job is more or less to create a (term -> list-of-rowids) mapping for the set of documents that are inserted into an FTS5 table. Usually, the FTS5 table also stores the original text. But if you want to save space, you can opt out of that by specifying 'content=""'. If you do this, of course you can't retrieve the original text when the FTS5 module needs it. For example to return the result of highlight(), or to delete a row from the FTS5 table (you need the original text to find the list of terms to delete from the index).
So there's a third option. If you use the content option specify a table, virtual table or view name, then every time the FTS5 module requires the original text for an entry, it runs:
SELECT * FROM $content WHERE $content_rowid = ?
to retrieve it. Where of course $content is the value of the content= option, and $content_rowid the value of content_rowid.
This can come in handy when you want to index data that is accessed via a virtual table. Or you want to use a view to ensure only a subset of the tables data is indexed. There are other, more esoteric, cases too.
(6) By ddevienne on 2020-04-15 09:04:01 in reply to 5.0 [link] [source]
Sounds quite interesting Dan.
Is this documented anywhere in more details please?
(7) By Dan Kennedy (dan) on 2020-04-15 14:37:08 in reply to 6 [link] [source]
(8) By ddevienne on 2020-04-15 15:45:45 in reply to 7 [link] [source]
Great. Thank you Dan. --DD