Filter one column for multiple values
(1) By Murtaza (murtazanw) on 2021-07-04 08:59:53 [link]
I am very new to SQLite and still learning. I have imported a csv file called 'CATALOGUE.csv' to a table called 'ctl' and running a query to filter the column 'partner_sku' for strings of certain pattern as below; SELECT partner_sku FROM ctl WHERE partner_sku LIKE 'A1012%' OR partner_sku LIKE 'A207%' OR partner_sku LIKE 'KH108%'; For few number of criteria in this query is alright, but as number grows it looks mess to handle. I read about WHERE with IN CLAUSE but don't know how to incorporate that here. Need your kind help.
(2) By Larry Brasfield (larrybr) on 2021-07-04 10:17:19 in reply to 1
>I am very new to SQLite and still learning. That should be fun. I will presume you're new to SQL as well. > I have imported a csv file called 'CATALOGUE.csv' to a table called 'ctl' and running a query to filter the column 'partner_sku' for strings of certain pattern as below; I almost cut that table source as not germane. > <code>SELECT partner_sku FROM ctl WHERE partner_sku LIKE 'A1012%' OR partner_sku LIKE 'A207%' OR partner_sku LIKE 'KH108%'; </code>For few number of criteria in this query is alright, but as number grows it looks mess to handle. Your instincts are right on this. If that WHERE clause represents just a happenstance set of criteria, then a somewhat repetitive condition is harmless. But if those LIKE terms represent a set derived logically from other data, there should probably be another table (or more) so that you can express your selection as a join. (See https://sqlite.org/optoverview.html#joins .) I also wonder if your "partner_sku" is not a composition. Seeing its values' leading letters makes me suspect they encode information that should occupy more than one column in a table. That refinement might reduce the need for extensive WHERE clauses. If this makes no sense yet, you probably should study "database normalization". It's a big subject, but important for serious database work. Even a short tutorial on it could benefit your schema design. > I read about WHERE with IN CLAUSE but don't know how to incorporate that here. Need your kind help. If you truly need LIKE expressions, (after such normalization as you see fit to perform), they are not going to go into an IN clause in a simple way.
(3) By Murtaza (murtazanw) on 2021-07-04 12:28:01 in reply to 2 [link]
Thank you so much LARRY BRASTFIELD for your quick reply, understanding of the matter and kind suggestions. I would definitely dive deep into JOIN commands as well as DATABASE NORMALIZATION to explore more about handling data efficiently. As you pointed out the need of another table having set of criteria and then running a query with JOIN command, the partner_sku is basically a uniquie product id in this catalogue table and other columns are having some related information like price, barcode etc. The first 3-4 letters of partner_sku are similar for a bunch of items because they are variants of same item which is the basis of my filtration say 'A1012'. Look below the few result of my query. I am trying here to pass the query with few letters of some items to get desired set of product_sku. I believe the above make some sense. +--------------+ | partner_sku | +--------------+ | A1012-20_BGN | | A1012-20_BLK | | A1012-20_BLU | | A1012-20_ROS | | A1012-20_PNK | | A1012-24_BGN | | A1012-24_BLU | | A1012-24_ROS | | A1012-24_PNK | | A1012-24_BLK | | A1012-28_BLU | | A1012-28_BGN | | A1012-28_PNK | | A1012-28_ROS | | A1012-28_BLK | | A1012-3_BGN | | A1012-3_BLU | | A1012-3_BLK | | A1012-3_ROS | | A1012-3_PNK | | A1012-32_ROS | | A1012-32_BLU | | A1012-32_BGN | | A1012-32_BLK | | A1012-32_PNK | | A1012-4_ROS | | A1012-4_BLK | | A1012-4_BLU | | A1012-4_PNK | | A1012-4_BGN |
(4) By Igor Tandetnik (itandetnik) on 2021-07-04 13:43:11 in reply to 3 [link]
I see that `A1012` is followed by a dash. Is this also the case with `A207` and `KH108`? If so, you could do something like WHERE substr(partner_sku, 1, instr(partner_sku, '-')-1) IN ('A1012', 'A207', 'KH108') --- Now, the point about normalization is this. It appears that the prefix of `partner_sku` has a special meaning. In this case, the database may benefit from breaking it out into a separate column, say `base_model_id`, that you can then directly search on.
(5.1) By Keith Medcalf (kmedcalf) on 2021-07-04 23:23:14 edited from 5.0 in reply to 4 [link]
This will, of course, require a full scan so it will work just fine if the table `ctl` has 10 rows (such as when testing) but will not be performant if the table `ctl` has a billion rows -- depending on if other conditions in the where clause can cull the candidates which need testing to a relatively small number before this condition is tested. If you need to do this regularly you might create an an index as follows to address this: ``` create index ctl_idx on ctl (substr(partner_sku, 1, instr(partner_sku, '-')-1)); ``` after which: ``` select * from ctl where substr(partner_sku, 1, instr(partner_sku, '-')-1) in ('A1012', 'A207', 'KH108'); ``` will use the index to find just the rows you are interested in. Alternatively, you could add a virtual column to the table containing the prefix and index that, and use the virtual column in the where clause instead: ``` alter table ctl add column partner_sku_prefix text as (substr(partner_sku, 1, instr(partner_sku, '-')-1)); create index ctl_idx on ctl(partner_sku_prefix); ``` and then ``` select * from ctl where partner_sku_prefix in ('A1012', 'A207', 'KH108'); ``` which will do the same thing, but you will not need to keep typing the long expression whenever you need to access the partner_sku_prefix.
(6) By Murtaza (murtazanw) on 2021-07-05 08:48:29 in reply to 4 [link]
Dear Igor Tandetnik, You guessed it perfectly right, the partner_sku is comprised of and combined with '-', the former part is family of items and the latter one is the variant. What I was trying to do is filtering those values based on their item family say 'A1012', 'A207' etc. Thank you so much. It worked like a charm. I seriously implemented your suggestion of having a separate column (actually a separate table) having item's family. Here are the steps I performed; sqlite> .import catalogue.csv ctl sqlite> CREATE TEMP TABLE item AS SELECT partner_sku , SUBSTR ( partner_sku , 1 , INSTR ( partner_sku , '-' ) - 1 ) AS item FROM ctl; sqlite> SELECT partner_sku, sku_child FROM ctl INNER JOIN item USING (partner_sku) WHERE item IN ( 'A1012' , 'A207' , 'KH108', 'KH247' , 'KH132' , 'KH134' ); And here is the result; partner_sku,sku_child A1012-20_BGN,N31450486A A1012-20_BLK,N31450462A A1012-20_BLU,N31450468A A1012-20_ROS,N31450474A A1012-20_PNK,N31450480A A1012-24_BGN,N31450487A A1012-24_BLU,N31450469A A1012-24_ROS,N31450475A A1012-24_PNK,N31450481A A1012-24_BLK,N31450463A A1012-28_BLU,N31450470A A1012-28_BGN,N31450488A A1012-28_PNK,N31450482A A1012-28_ROS,N31450476A A1012-28_BLK,N31450464A A1012-3_BGN,N31450490A A1012-3_BLU,N31450472A A1012-3_BLK,N31450466A A1012-3_ROS,N31450478A A1012-3_PNK,N31450484A A1012-32_ROS,N31450477A A1012-32_BLU,N31450471A A1012-32_BGN,N31450489A A1012-32_BLK,N31450465A A1012-32_PNK,N31450483A A1012-4_ROS,N31450479A A1012-4_BLK,N31450467A A1012-4_BLU,N31450473A A1012-4_PNK,N31450485A A1012-4_BGN,N31450491A A207-20_BGN,N31450456A A207-20_ROS,N31450444A A207-20_BLU,N31450438A A207-20_PNK,N31450450A A207-20_BLK,N31450432A A207-24_BLK,N31450433A A207-24_BGN,N31450457A A207-24_ROS,N31450445A A207-24_BLU,N31450439A A207-24_PNK,N31450451A A207-28_BLK,N31450434A A207-28_BGN,N31450458A A207-28_ROS,N31450446A A207-28_BLU,N31450440A A207-28_PNK,N31450452A A207-3_ROS,N31450448A A207-3_BLK,N31450436A A207-3_BLU,N31450442A A207-3_BGN,N31450460A A207-3_PNK,N31450454A A207-32_BLU,N31450441A A207-32_BGN,N31450459A A207-32_BLK,N31450435A A207-32_ROS,N31450447A A207-32_PNK,N31450453A A207-4_BLU,N31450443A A207-4_BGN,N31450461A A207-4_BLK,N31450437A A207-4_ROS,N31450449A A207-4_PNK,N31450455A KH108-20_BLU,N43895442A KH108-20_BLK,N12740940A KH108-20_BGN,N43895448A KH108-20_BRN,N43895436A KH108-20_BRW,N12740943A KH108-20_MRN,N12740949A KH108-20_NVY,N12740946A KH108-24_BGN,N43895449A KH108-24_BLU,N43895443A KH108-24_BLK,N12740941A KH108-24_BRN,N43895437A KH108-24_MRN,N12740950A KH108-24_NVY,N12740947A KH108-24_BRW,N12740944A KH108-28_BGN,N43895450A KH108-28_BLK,N12740942A KH108-28_BLU,N43895444A KH108-28_BRN,N43895438A KH108-28_MRN,N12740951A KH108-28_NVY,N12740948A KH108-28_BRW,N12740945A KH108-3_BGN,N43895452A KH108-3_BLU,N43895446A KH108-3_BRN,N43895440A KH108-32_BRN,N43895439A KH108-32_BGN,N43895451A KH108-32_BLK,N40910565A KH108-32_BLU,N43895445A KH108-32_BRW,N40910566A KH108-32_NVY,N40910567A KH108-4_BLU,N43895447A KH108-4_BRN,N43895441A KH108-4_BGN,N43895453A KH132-20_COF,N18602990A KH132-20_NVY,N18602992A KH132-20_BLK,N18602991A KH132-24_BLK,N18602994A KH132-24_COF,N18602993A KH132-24_NVY,N18602995A KH132-28_COF,N18602996A KH132-28_BLK,N18602997A KH132-28_NVY,N18602998A KH132-3_BLK,N18602988A KH132-3_COF,N18602987A KH132-3_NVY,N18602989A KH134-20_BLK,N18603015A KH134-20_SIL,N18603016A KH134-20_BGN,N42588980A KH134-20_NVY,N18603014A KH134-24_BGN,N42588981A KH134-24_BLK,N18603018A KH134-24_SIL,N18603019A KH134-24_NVY,N18603017A KH134-28_BLK,N18603021A KH134-28_BGN,N42588982A KH134-28_SIL,N18603022A KH134-28_NVY,N18603020A KH134-3_BLK,N18603012A KH134-3_BGN,N42588983A KH134-3_SIL,N18603013A KH134-3_NVY,N18603011A KH247-20_NVY,N18603058A KH247-20_BRW,N43895454A KH247-20_BLK,N18603057A KH247-20_BGN,N43895466A KH247-20_BRN,N18603055A KH247-20_MRN,N18603056A KH247-20_BLU,N43895460A KH247-24_BRW,N43895455A KH247-24_BLK,N18603061A KH247-24_BRN,N18603059A KH247-24_MRN,N18603060A KH247-24_BGN,N43895467A KH247-24_NVY,N18603062A KH247-24_BLU,N43895461A KH247-28_BRW,N43895456A KH247-28_BLU,N43895462A KH247-28_BLK,N18603065A KH247-28_NVY,N18603066A KH247-28_BRN,N18603063A KH247-28_BGN,N43895468A KH247-28_MRN,N18603064A KH247-3_BRW,N43895458A KH247-3_BLU,N43895464A KH247-3_BGN,N43895470A KH247-32_BLU,N43895463A KH247-32_BGN,N43895469A KH247-32_BLK,N18603069A KH247-32_BRW,N43895457A KH247-32_NVY,N18603070A KH247-32_MRN,N18603068A KH247-32_BRN,N18603067A KH247-4_BRN,N18603051A KH247-4_BRW,N43895459A KH247-4_BLU,N43895465A KH247-4_BGN,N43895471A KH247-4_BLK,N18603053A KH247-4_NVY,N18603054A KH247-4_MRN,N18603052A