SQLite Forum

FTS5 rows not searchable with match
Login

FTS5 rows not searchable with match

(1.1) Originally by anonymous with edits by Dan Kennedy (dan) on 2020-09-21 06:59:48 from 1.0 [link] [source]

Hello,

I have an FTS5 table using a view as an external content table, kept up to date with triggers. Defined as follows:

CREATE VIEW item_view AS SELECT item.id, item.title, item.series, (SELECT Group_Concat(creator.name, ', ') FROM creator JOIN item_creator ON item_creator.creator_id = creator.id AND item_creator.item_id = item.id) as creator, publisher.name as publisher, item.year, (SELECT Group_Concat(subject.name, ', ') FROM subject JOIN item_subject ON item_subject.subject_id = subject.id AND item_subject.item_id = item.id) as subject, item.description FROM item LEFT JOIN publisher ON publisher.id = item.publisher_id WHERE item.hidden = FALSE
/* item_view(id,title,series,creator,publisher,year,subject,description) */;

CREATE TRIGGER __item_search_ai INSTEAD OF INSERT ON item_view BEGIN
INSERT INTO item_search(rowid, title, series, creator, publisher, year, subject, description) VALUES(new.id, new.title, new.series, new.creator, new.publisher, new.year, new.subject, new.description);
END;
CREATE TRIGGER __item_search_ad INSTEAD OF DELETE ON item_view BEGIN
INSERT INTO item_search(item_search, rowid, title, series, creator, publisher, year, subject, description) VALUES('delete', old.id, old.title, old.series, old.creator, old.publisher, old.year, old.subject, old.description);
END;
CREATE TRIGGER __item_search_au INSTEAD OF UPDATE ON item_view BEGIN
INSERT INTO item_search(item_search, rowid, title, series, creator, publisher, year, subject, description) VALUES('delete', old.id, old.title, old.series, old.creator, old.publisher, old.year, old.subject, old.description);
INSERT INTO item_search(rowid, title, series, creator, publisher, year, subject, description) VALUES (new.id, new.title, new.series, new.creator, new.publisher, new.year, new.subject, new.description);
END;

CREATE VIRTUAL TABLE "item_search"
USING fts5(title, series, creator, publisher, year, subject, description,
tokenize="unicode61 remove_diacritics 2 tokenchars '-_.'",
content='item_view',
content_rowid='id')
/* item_search(title,series,creator,publisher,year,subject,description) */;

I've noticed that at least sometimes, I'm not under what conditions, match is not able to find some rows yet I can query them from the FTS5 table by selecting rowid. Issuing a rebuild will fix it but I am not sure why new entries are not always automatically searchable with match.

(2) By anonymous on 2020-09-21 02:28:50 in reply to 1.0 [source]

Running an integrity check on the FTS5 table, INSERT INTO item_search(item_search) VALUES('integrity-check'), I get:

database disk image is malformed

such happened a day after rebuilding the table. I'm running WAL mode with synchronous = NORMAL. I'm rebuilding now and will try synchronous = FULL for a little while.

(3) By Dan Kennedy (dan) on 2020-09-21 07:15:45 in reply to 2 [link] [source]

It looks like the 'integrity-check' function compares the contents of the FTS5 index with the contents of the external content table. So in this case it is saying that the contents of the index is inconsistent with the contents of your content table (the view "item_view"). This doesn't seem right to me - internal inconsistencies in data stored by FTS5 itself are corruption, not inconsistency with data stored in an external content table.

With an external content table, querying the FTS5 table using anything other than a full-text query is pretty much the same as querying the external content table directly. FTS5 just passes the query through.

So the contents of your view and the contents of the FTS5 table are inconsistent. How they got that way is unclear - presumably the app modified the tables underlying the view but did not update the FTS5 index, either directly or via the triggers on the view.

(4) By anonymous on 2020-09-22 20:39:20 in reply to 3 [link] [source]

I noticed a patch for what you mentioned. Thank you. I also saw something earlier about FTS5 corruption; unsure if that might in some way be related.

Thinking about how inconsistency could occur, as I have been developing the app, I often before would build/run from my IDE which would force quit the running instance. Perhaps writes were done w/o view and trigger updates.

In my case, I am using the FTS5 table merely for match of certain columns – title, creator/author, publisher, description, etc. For display in app, or other purposes, all columns are retrieved from the tables themselves with a join. In such a case, would it be better to use a contentless table?

Concerning updates to FTS5, if in the case current tables and view differ from FTS, upon an update, if it is trying to incorrectly delete data, old.column values, do not exist, as mentioned in the docs "leave the full-text index in an unpredictable state," perhaps that is an issue here. Related to that, I recall seeing some use update/delete triggers that did not specify old.values but merely rowid. Is such ok or from how I interpret the docs, maybe it isn't.

Thanks for the help.

(5) By anonymous on 2020-09-26 01:09:15 in reply to 4 [link] [source]

It seems that external content may be my only option. Perhaps I made an error, I had tried a test of a normal w/content table yet maybe triggers won't insert into such. Running rebuild didn't fill the table.

Running a rebuild again on my contentless setup, I didn't test immediately afterwards, but shortly after, new records added to the FTS5 remain unsearchable with match. Perhaps you have some idea of what might be the issue?

I had tried FTS4 the other day. Perhaps I got the CREATE TABLE incorrect, or the difference in triggers was in error, yet table rows were not searchable with match after initial rebuild. Such is a thing I'll test more and maybe FTS4 is more suitable as perhaps there are unknown circumstances that can cause what I'm seeing with FTS5.

Concerning my DB, 3+ million records, w/indices and FTS5 table, about 10 gigs. It's possible that at times, there are frequent sustained bouts of hours where view updates that would cause FTS inserts or updates are occurring at about up to 10 per second.

I recall some list thread that one needs to issue sqlite3_close_v2 for some FTS updates. That makes me think maybe FTS updates from a trigger might not be part of of table transactions? I'm going on the possibility that maybe updates/deletes specifying old.values might not match what's in the table, and such may be a possible cause. Yet I haven't seen issues with finding records existing prior to a rebuild.

The unpredictable state of the index mentioned when one specifies incorrect old.values, maybe that's it. I'm not sure. Maybe not as many are using external content and as such this hasn't been seen before? I assume yet I am unsure if row updates of columns not in the view, that result in the view row being the same, does the view row get removed and re-added, and the trigger run again? I assume not yet it's good to be sure.

I really have no idea, yet if it is the case that FTS5 updates via trigger are not part of the transaction, or if there's something else, maybe some change is warranted? Is there some chance that table/view data could have been updated without the trigger having been run, such that on the next update, it is not possible to specify correct old.values to delete? It may not be always possible to ensure a connection is closed, if such is necessary. Apps can crash, on mobile, at least on iOS, apps put into the background can be closed and flushed from memory, so that what may need to run doesn't. If there is something about external content that makes sync in some cases difficult, not sure what can be done.

I'm sort of at a loss. Perhaps the upcoming build with FTS fixes might in some way be related. Maybe I'll have better luck with FTS4, and creating a test db with a subset of data would let me test it out. Besides that, not sure what to do.

(6) By Wout Mertens (wmertens) on 2020-09-26 07:25:30 in reply to 5 [link] [source]

I propose you create a test case too research your issues. Triggers are part of the transaction - how could they not be?

Data not being searchable even after triggers ran is weird.

We don't use triggers, we manage the FTS data from the application so we have more control. We tried contentless, but it is hard to manage since you need to provide the previous content on every update.

We don't encounter the problems you're seeing, in any case.

(7.1) By Dan Kennedy (dan) on 2020-09-26 08:16:31 edited from 7.0 in reply to 5 [link] [source]

Is there some chance that table/view data could have been updated without the trigger having been run, such that on the next update, it is not possible to specify correct old.values to delete?

Your first post contained triggers on views. e.g.

    CREATE TRIGGER __item_search_ai INSTEAD OF INSERT ON item_view BEGIN
        INSERT INTO item_search(rowid, title, series, creator, publisher, year, subject, description) VALUES(new.id, new.title, new.series, new.creator, new.publisher, new.year, new.subject, new.description);
    END;

You know this trigger runs only when the client (or some other trigger) executes a statement that begins with "INSERT INTO item_view...", right?

More specifically, you're aware that triggers on views do not automatically run whenever the tables used by the view are modified such that the contents of the view changes?

Dan.

(8) By anonymous on 2020-09-26 11:37:46 in reply to 7.1 [link] [source]

Dan,

Please forgive me as I am not quite an expert in either SQL or SQLite. My knowledge is not bad but it could be much better.

I'm not sure what to make of your questions. Checking a few new entries in my main table from a few hours ago, they are all in the view with correct data, that from what I understand view entries are kept up to date from the view definition, which I interpreted as an INSERT INTO item_view.

Checking the FTS5 table, new rows are also there, with all columns correct. As such, I thought the triggers were correct. I thought that that the triggers were automatically run. As such, I am puzzled why they are there yet not searchable by FTS; I can find them with SELECT … where rowid = ?; I cannot with SELECT * from item_search where item_search match ? nor with my used search queries.

I understand that explaining such may not be enjoyable. As they say, you have much better things to do. Myself, I tend not to get into conversations about things I am more learned about when others are less so. Perhaps I have the capability and interest to eventually have some level of SQL knowledge of someone who has done it for a few years or more, yet I have not yet put in such time. From what I can tell at this moment, I may have to myself run statements to insert, delete, and update the view any time I modify other tables that would potentially change the view and desired FTS data. Is that correct? Yet the rows are already automatically populated into the FTS table so there is something I do not understand.

I look into it more. Maybe I'll find the answer; maybe it'll take a bit or a lot of time. I can find only one possibly relevant thread in the old list. If you have some other suggestion about some ideal setup of view (used mainly for in app display as you can deduce), triggers, or possibly other, that would best simplify desired results, I'd very much appreciate some guidance.

(9) By Wout Mertens (wmertens) on 2020-09-26 12:14:31 in reply to 8 [link] [source]

Silly question: does contentless FTS5 even work on a view? Doesn't it always need a rowid?

(10) By anonymous on 2020-09-26 12:51:38 in reply to 9 [link] [source]

I had meant FTS external content in any previous mention of 'contentless' save one time I had mentioned using such in place of my current external content.

I think I may understand now. The triggers do automatically run from what I can tell, as they have, but INSERT INTO an FTS external content table is somehow not the same as inserting into a normal FTS. The rows are there but the indices are not built. Maybe such is in the docs and I missed it.

Maybe it is possible to achieve what I can from just correctly defined triggers? I've tried this, and so far in a few minutes of testing, it works. The idea, if correct is, that on changes to the view, the trigger takes the insert, delete, or update from the view itself and then re-inserts into or deletes such values from the view.

CREATE TRIGGER __item_search_ai INSTEAD OF INSERT ON item_view BEGIN INSERT INTO item_view(id, title, series, creator, publisher, year, subject, description) VALUES(new.id, new.title, new.series, new.creator, new.publisher, new.year, new.subject, new.description); END; CREATE TRIGGER __item_search_ad INSTEAD OF DELETE ON item_view BEGIN INSERT INTO item_search(item_search, rowid, title, series, creator, publisher, year, subject, description) VALUES('delete', old.id, old.title, old.series, old.creator, old.publisher, old.year, old.subject, old.description); END; CREATE TRIGGER __item_search_au INSTEAD OF UPDATE OF title, series, creator, publisher, year, subject, description ON item_view BEGIN INSERT INTO item_search(item_search, rowid, title, series, creator, publisher, year, subject, description) VALUES('delete', old.id, old.title, old.series, old.creator, old.publisher, old.year, old.subject, old.description); INSERT INTO item_view(id, title, series, creator, publisher, year, subject, description) VALUES(new.id, new.title, new.series, new.creator, new.publisher, new.year, new.subject, new.description); END;

If such is correct, it sure has been as they say a headache getting it. In the process of learning more SQL and SQLite than I had previously known – joins, subqueries, views, triggers, FTS, etc., with enough effort indeed one could piece such together but missing the key parts from the docs that one could miss in trying to learn the subjects overall, I didn't figure it out. There isn't much info nor examples of others using external content, so it has been a struggle. It seems correct, but maybe it's not. I'll continue more tests; yet so far it seems ok.

(11) By TripeHound on 2020-09-26 14:16:33 in reply to 10 [link] [source]

Is:

CREATE TRIGGER __item_search_ai INSTEAD OF INSERT ON item_view BEGIN INSERT INTO item_view(...

correct, or should it – to match the other triggers – be ...INSERT INTO item_search(...``?

(12.1) By Dan Kennedy (dan) on 2020-09-26 15:21:04 edited from 12.0 in reply to 8 [link] [source]

Whatever I said to imply you were being obtuse I apologize for.

When you update the tables in a view, those changes are reflected in the view immediately. So that is expected. However, the triggers on the view are not run and so the FTS5 index is not updated.

If you query the FTS5 table by rowid, or just using "SELECT * FROM item_search", you will see the new rows. This is because FTS5 passes all such queries directly through to the underlying view or table. It only uses the FTS index - the thing that is not being updated - when you do a full-text query. Hence you can see the rows for non full-text queries, but not for those that use MATCH (or the other, equivalent, table-value function syntax).

From what I can tell at this moment, I may have to myself run statements to insert, delete, and update the view any time I modify other tables that would potentially change the view and desired FTS data. Is that correct?

Entirely correct.

But really, why do it? If you're going to have to do separate update/delete/insert statements on the view, why not drop the triggers and run the required update/delete/insert statements on the fts5 table directly?

Dan.

(13) By anonymous on 2020-09-27 19:25:48 in reply to 12.1 [link] [source]

… FTS5 passes all such queries directly through to the underlying view or table. It only uses the FTS index - the thing that is not being updated - when you do a full-text query. Hence you can see the rows for non full-text queries, but not for those that use MATCH (or the other, equivalent, table-value function syntax).

I get it now. It took a little while but now, everything, at least related to my struggles, is clear.

If you're going to have to do separate update/delete/insert statements on the view, why not drop the triggers and run the required update/delete/insert statements on the fts5 table directly?

Indeed. After thinking about it, somehow I was still stuck on the idea of triggers, but after accepting the idea of running the needed statements myself, it took more time to overcome my mis-understanding of view triggers still, and somehow thinking triggers were integral to FTS, than to come up with a rough coded outline of what to do.

I had to reorganize a bit of code, to calculate an old FTS row, make needed table changes, calculate new, compare and update if needed, to do such in a transaction in one place to make it easier to track, such merely took the afternoon to figure out all the needed pieces and getting working. I'm sure it now works. Not bad at all, I was just missing pieces and had mis-understandings that prevented me from figuring it out all this time.

Thank you very much Dan.