request tags from a substring
(1) By vite (vi-618) on 2023-08-16 15:55:44 [link] [source]
Hi I have a simple database with one table which includes a 'keywords' column. I need to build a query to get records by keywords. But, the keywords are stored as a string, which looks like this: a,b,c,d,e
I originally used the LIKE
operator, but it's not very reliable in some combinations. Then, I discovered a more reliable recursive query mechanism, for example:
WITH RECURSIVE
cnt(token,reman) AS (
SELECT NULL, 'a,b,c,d,e'||','
UNION ALL SELECT
substr(reman,0,instr(reman,',')),
substr(reman,instr(reman,',')+1)
FROM cnt WHERE reman!=''
) SELECT token FROM cnt WHERE token=='a' OR token=='b';
However, I need to get related records via the logical AND
operator, but unfortunately I can't find a suitable example. I would appreciate any help.
Thanks
(2) By Ryan Smith (cuz) on 2023-08-16 19:05:00 in reply to 1 [link] [source]
It is not clear what you mean at all.
Are you saying that you have a table with some fields, and one of the fields contains a comma-separated list of keywords, and that you would like to select rows from this table where those keywords match one or more given values? (The explanation sounds like this option, but the Recursive query won't work for it at all),
OR,
are you saying that you have a table with among other fields, a field in which a single Keyword appears, but then you need to lookup the rows by keyword based on a given value string that can contain multiple keywords separated by commas? (Your description does not sound like this option, but the Recursive query can work for it).
I'm going to take a wild guess that you mean the former and suggest one way of doing it. Consider the following example table:
ID | Name | Somefield | Keywords
----|-------|-----------|---------
1 | Joe | 123 | Pisces,Water,Dolphin,Gray
2 | Jane | 15 | Cancer,Water,Crab,Red
3 | Jimmy | 23 | Aries,Land,Ram,Gray
4 | Jenny | 321 | Taurus,Land,Steer,Red
One way to query the rows that contain a value would be:
SELECT * FROM t
WHERE instr(',' || Keywords || ',' , ',Gray,') > 0
;
ID | Name | Somefield | Keywords
----|-------|-----------|---------
1 | Joe | 123 | Pisces,Water,Dolphin,Gray
3 | Jimmy | 23 | Aries,Land,Ram,Gray
And a way to query the rows that contain more than one value would be simply:
SELECT * FROM t
WHERE instr(',' || Keywords || ',' , ',Red,' ) > 0
AND instr(',' || Keywords || ',' , ',Land,') > 0
;
-- OR --
SELECT * FROM t
WHERE (',' || Keywords || ',') LIKE '%,Red,%'
AND (',' || Keywords || ',') LIKE '%,Land,%'
;
ID | Name | Somefield | Keywords
----|-------|-----------|---------
4 | Jenny | 321 | Taurus,Land,Steer,Red
etc. If you are quite sure all the keywords are unique in all parts, you can do away with the comma shenanigans, but it's almost always necessary since the test for '%port%' will find all of 'port', 'import' or 'important', whereas the test for '%,port,%' will only find 'port' and not the others.
If none of this seems to be what you are looking for, kindly be much more specific with your question, possibly adding example data and what you expect a query to return.
(3.1) By vite (vi-618) on 2023-08-16 20:10:48 edited from 3.0 in reply to 2 [link] [source]
Thanks for your reply. You're right, the question is not well-defined. Here is a list of keywords for example:
1|computer-sci,systems
2|computer-sci,programming,language,c
3|c
Note entry 3, where the single keyword 'c' is used. In this case, your example does not work correctly. I used the following query:
SELECT id,keywords FROM entries
WHERE instr(',' || keywords || ',' , ',computer-sci,' ) > 0
AND instr(',' || keywords || ',' , ',c,') > 0;
(4.1) By vite (vi-618) on 2023-08-16 20:39:25 edited from 4.0 in reply to 3.0 [link] [source]
I'm sorry, that's right. The query condition was successful.
Thanks for your help my friend.
(5) By Gunter Hick (gunter_hick) on 2023-08-17 05:27:34 in reply to 1 [link] [source]
The easiest way to work around such denormalization is to write an ephemeral only table valued function to split the keyword list into keywords with an internal schema of CREATE TABLE keys (key TEXT, input TEXT hidden); Then you can SELECT key FROM keys WHERE input='a,b,c,d'; or, shorter SELECT keys('a,b,c,d'); or SELECT keys(t.keywords) k,... FROM table t WHERE k='a' OR k='b'...; For proper normalization, you would have a table of keywords and a relation joining them to your data. CREATE TABLE keys (id INTEGER PRIMARY KEY, key TEXT); CREATE TABLE keys_in_table (keyid INTEGER, tableid INTEGER, FOREIGN KEY (keyid) REFERENCES keys(id), FOREIGN KEY (tableid) REFERENCES table(id)); Your query then becomes a simple JOIN.
(6) By vite (vi-618) on 2023-08-17 07:22:05 in reply to 5 [source]
Thanks for your reply. I don't understand the specifics of a SQL query very well to properly evaluate performance. Probably the scheme you proposed will be the optimal solution for processing big data. I will most likely do so in the future.
(7) By Ryan Smith (cuz) on 2023-08-17 08:28:21 in reply to 6 [link] [source]
It is definitely worth looking into and getting familiar with the specifics.
Note that my example makes what you wish to do "possible", but certainly not fast. What Gunther suggested is really the correct way to do it, much much faster, and how it is commonly done in practice, so being familiar with that method will serve you well in future.
Getting your current data into a more normalized shape can be done with queries easily (feel free to ask for assistance with that here), but the real boon will be to make your application or whatever is pushing the data into the DB, to already do it correctly by adding the keywords to the linked table.
A bit of googling about "linking tags in db data" and "linked list tables" and the like will get you a wealth of examples and theory around how to best do it.