Request for general guidance on approach to searching text for word combinations.

(1.2) By Gary (1codedebugger) on 2021-08-27 01:54:57 edited from 1.1 [link] [source]

Hello, I have a novice question concerning searching for words and word combinations within strings of text.

The table includes translation_no, book_no, chapter_no, verse_no, text. I'd like to be able to return all rows in which, for example, a variable number of specific words occur in any order in the text of a particular translation; and I'd like to return the results in order of all verses that have the words in exact order, then the words sequentially in any order, and, after that, it really doesn't matter much apart from being in book_no, chapter_no, verse_no order.

I assume it would be very inefficient to search through the text of every row in this table for each request.  The text column ranges in length from 11 to 556 characters.

Should I build a table that is the equivalent of an electronic concordance, where there is one row for each unique word that occurs across all rows of text, and a list of every combination of book_no, chapter_no, verse_no, word_no; and then examine the intersection of the three in non-SQL code (currently using Tcl)?

Or, have multiple rows for every unique word with its book_no,..., word_no values, and determine the result from within SQLite? Perhaps, a CTE that returns a table for each individual word, and then select all rows that have the same book_no, chapter_no, verse_no as a start?

Or is there a better approach? Thank you for any guidance you may be able to provide.

(2) By Jim Morris (jimmorris33215) on 2021-08-27 01:10:22 in reply to 1.1 [source]

Hi, What I think you want is Full Text Search, You can probably search on FTS and SQLite.

(3) By Gary (1codedebugger) on 2021-08-27 02:16:18 in reply to 2 [link] [source]

Thank you. That is a very useful tool, it appears, especially for content that changes often.

I wonder how it compares to the concordance method since my content isn't going to change and each row is rather small. I've been working on that and there are issues of punctuation and words in italics used for smooth translation that aren't really in the original manuscripts.  The FTS extension might handle that automatically.  I'll have to experiment.

Thanks again.