SQLite Forum

Filter one column for multiple values
Login

Filter one column for multiple values

(1) By Murtaza (murtazanw) on 2021-07-04 08:59:53 [link] [source]

I am very new to SQLite and still learning.

I have imported a csv file called 'CATALOGUE.csv' to a table called 'ctl' and running a query to filter the column 'partner_sku' for strings of certain pattern as below;

SELECT partner_sku FROM ctl WHERE partner_sku LIKE 'A1012%' OR partner_sku LIKE 'A207%' OR partner_sku LIKE 'KH108%';

For few number of criteria in this query is alright, but as number grows it looks mess to handle.

I read about WHERE with IN CLAUSE but don't know how to incorporate that here. Need your kind help.

(2) By Larry Brasfield (larrybr) on 2021-07-04 10:17:19 in reply to 1 [link] [source]

I am very new to SQLite and still learning.

That should be fun. I will presume you're new to SQL as well.

I have imported a csv file called 'CATALOGUE.csv' to a table called 'ctl' and running a query to filter the column 'partner_sku' for strings of certain pattern as below;

I almost cut that table source as not germane.

SELECT partner_sku FROM ctl WHERE partner_sku LIKE 'A1012%' OR partner_sku LIKE 'A207%' OR partner_sku LIKE 'KH108%'; For few number of criteria in this query is alright, but as number grows it looks mess to handle.

Your instincts are right on this. If that WHERE clause represents just a happenstance set of criteria, then a somewhat repetitive condition is harmless. But if those LIKE terms represent a set derived logically from other data, there should probably be another table (or more) so that you can express your selection as a join. (See https://sqlite.org/optoverview.html#joins .)

I also wonder if your "partner_sku" is not a composition. Seeing its values' leading letters makes me suspect they encode information that should occupy more than one column in a table. That refinement might reduce the need for extensive WHERE clauses. If this makes no sense yet, you probably should study "database normalization". It's a big subject, but important for serious database work. Even a short tutorial on it could benefit your schema design.

I read about WHERE with IN CLAUSE but don't know how to incorporate that here. Need your kind help.

If you truly need LIKE expressions, (after such normalization as you see fit to perform), they are not going to go into an IN clause in a simple way.

(3) By Murtaza (murtazanw) on 2021-07-04 12:28:01 in reply to 2 [link] [source]

Thank you so much LARRY BRASTFIELD for your quick reply, understanding of the matter and kind suggestions.

I would definitely dive deep into JOIN commands as well as DATABASE NORMALIZATION to explore more about handling data efficiently.

As you pointed out the need of another table having set of criteria and then running a query with JOIN command, the partner_sku is basically a uniquie product id in this catalogue table and other columns are having some related information like price, barcode etc.

The first 3-4 letters of partner_sku are similar for a bunch of items because they are variants of same item which is the basis of my filtration say 'A1012'. Look below the few result of my query. I am trying here to pass the query with few letters of some items to get desired set of product_sku.

I believe the above make some sense.

+--------------+ | partner_sku | +--------------+ | A1012-20_BGN | | A1012-20_BLK | | A1012-20_BLU | | A1012-20_ROS | | A1012-20_PNK | | A1012-24_BGN | | A1012-24_BLU | | A1012-24_ROS | | A1012-24_PNK | | A1012-24_BLK | | A1012-28_BLU | | A1012-28_BGN | | A1012-28_PNK | | A1012-28_ROS | | A1012-28_BLK | | A1012-3_BGN | | A1012-3_BLU | | A1012-3_BLK | | A1012-3_ROS | | A1012-3_PNK | | A1012-32_ROS | | A1012-32_BLU | | A1012-32_BGN | | A1012-32_BLK | | A1012-32_PNK | | A1012-4_ROS | | A1012-4_BLK | | A1012-4_BLU | | A1012-4_PNK | | A1012-4_BGN |

(4) By Igor Tandetnik (itandetnik) on 2021-07-04 13:43:11 in reply to 3 [link] [source]

I see that A1012 is followed by a dash. Is this also the case with A207 and KH108? If so, you could do something like

WHERE substr(partner_sku, 1, instr(partner_sku, '-')-1) IN ('A1012', 'A207', 'KH108')

Now, the point about normalization is this. It appears that the prefix of partner_sku has a special meaning. In this case, the database may benefit from breaking it out into a separate column, say base_model_id, that you can then directly search on.

(5.1) By Keith Medcalf (kmedcalf) on 2021-07-04 23:23:14 edited from 5.0 in reply to 4 [source]

This will, of course, require a full scan so it will work just fine if the table ctl has 10 rows (such as when testing) but will not be performant if the table ctl has a billion rows -- depending on if other conditions in the where clause can cull the candidates which need testing to a relatively small number before this condition is tested.

If you need to do this regularly you might create an an index as follows to address this:

create index ctl_idx on ctl (substr(partner_sku, 1, instr(partner_sku, '-')-1));

after which:

select * from ctl where substr(partner_sku, 1, instr(partner_sku, '-')-1) in ('A1012', 'A207', 'KH108');

will use the index to find just the rows you are interested in.

Alternatively, you could add a virtual column to the table containing the prefix and index that, and use the virtual column in the where clause instead:

alter table ctl add column partner_sku_prefix text as (substr(partner_sku, 1, instr(partner_sku, '-')-1));
create index ctl_idx on ctl(partner_sku_prefix);

and then

select * from ctl where partner_sku_prefix in ('A1012', 'A207', 'KH108');

which will do the same thing, but you will not need to keep typing the long expression whenever you need to access the partner_sku_prefix.

(6) By Murtaza (murtazanw) on 2021-07-05 08:48:29 in reply to 4 [link] [source]

Dear Igor Tandetnik,

You guessed it perfectly right, the partner_sku is comprised of and combined with '-', the former part is family of items and the latter one is the variant. What I was trying to do is filtering those values based on their item family say 'A1012', 'A207' etc.


Thank you so much. It worked like a charm. I seriously implemented your suggestion of having a separate column (actually a separate table) having item's family. 
Here are the steps I performed;


sqlite> .import catalogue.csv ctl

sqlite> CREATE TEMP TABLE item AS SELECT partner_sku , SUBSTR ( partner_sku , 1 , INSTR ( partner_sku , '-' ) - 1 ) AS item FROM ctl;

sqlite> SELECT partner_sku, sku_child FROM ctl INNER JOIN item USING (partner_sku) WHERE item IN ( 'A1012' , 'A207' , 'KH108', 'KH247' , 'KH132' , 'KH134' );

And here is the result;

partner_sku,sku_child
A1012-20_BGN,N31450486A
A1012-20_BLK,N31450462A
A1012-20_BLU,N31450468A
A1012-20_ROS,N31450474A
A1012-20_PNK,N31450480A
A1012-24_BGN,N31450487A
A1012-24_BLU,N31450469A
A1012-24_ROS,N31450475A
A1012-24_PNK,N31450481A
A1012-24_BLK,N31450463A
A1012-28_BLU,N31450470A
A1012-28_BGN,N31450488A
A1012-28_PNK,N31450482A
A1012-28_ROS,N31450476A
A1012-28_BLK,N31450464A
A1012-3_BGN,N31450490A
A1012-3_BLU,N31450472A
A1012-3_BLK,N31450466A
A1012-3_ROS,N31450478A
A1012-3_PNK,N31450484A
A1012-32_ROS,N31450477A
A1012-32_BLU,N31450471A
A1012-32_BGN,N31450489A
A1012-32_BLK,N31450465A
A1012-32_PNK,N31450483A
A1012-4_ROS,N31450479A
A1012-4_BLK,N31450467A
A1012-4_BLU,N31450473A
A1012-4_PNK,N31450485A
A1012-4_BGN,N31450491A
A207-20_BGN,N31450456A
A207-20_ROS,N31450444A
A207-20_BLU,N31450438A
A207-20_PNK,N31450450A
A207-20_BLK,N31450432A
A207-24_BLK,N31450433A
A207-24_BGN,N31450457A
A207-24_ROS,N31450445A
A207-24_BLU,N31450439A
A207-24_PNK,N31450451A
A207-28_BLK,N31450434A
A207-28_BGN,N31450458A
A207-28_ROS,N31450446A
A207-28_BLU,N31450440A
A207-28_PNK,N31450452A
A207-3_ROS,N31450448A
A207-3_BLK,N31450436A
A207-3_BLU,N31450442A
A207-3_BGN,N31450460A
A207-3_PNK,N31450454A
A207-32_BLU,N31450441A
A207-32_BGN,N31450459A
A207-32_BLK,N31450435A
A207-32_ROS,N31450447A
A207-32_PNK,N31450453A
A207-4_BLU,N31450443A
A207-4_BGN,N31450461A
A207-4_BLK,N31450437A
A207-4_ROS,N31450449A
A207-4_PNK,N31450455A
KH108-20_BLU,N43895442A
KH108-20_BLK,N12740940A
KH108-20_BGN,N43895448A
KH108-20_BRN,N43895436A
KH108-20_BRW,N12740943A
KH108-20_MRN,N12740949A
KH108-20_NVY,N12740946A
KH108-24_BGN,N43895449A
KH108-24_BLU,N43895443A
KH108-24_BLK,N12740941A
KH108-24_BRN,N43895437A
KH108-24_MRN,N12740950A
KH108-24_NVY,N12740947A
KH108-24_BRW,N12740944A
KH108-28_BGN,N43895450A
KH108-28_BLK,N12740942A
KH108-28_BLU,N43895444A
KH108-28_BRN,N43895438A
KH108-28_MRN,N12740951A
KH108-28_NVY,N12740948A
KH108-28_BRW,N12740945A
KH108-3_BGN,N43895452A
KH108-3_BLU,N43895446A
KH108-3_BRN,N43895440A
KH108-32_BRN,N43895439A
KH108-32_BGN,N43895451A
KH108-32_BLK,N40910565A
KH108-32_BLU,N43895445A
KH108-32_BRW,N40910566A
KH108-32_NVY,N40910567A
KH108-4_BLU,N43895447A
KH108-4_BRN,N43895441A
KH108-4_BGN,N43895453A
KH132-20_COF,N18602990A
KH132-20_NVY,N18602992A
KH132-20_BLK,N18602991A
KH132-24_BLK,N18602994A
KH132-24_COF,N18602993A
KH132-24_NVY,N18602995A
KH132-28_COF,N18602996A
KH132-28_BLK,N18602997A
KH132-28_NVY,N18602998A
KH132-3_BLK,N18602988A
KH132-3_COF,N18602987A
KH132-3_NVY,N18602989A
KH134-20_BLK,N18603015A
KH134-20_SIL,N18603016A
KH134-20_BGN,N42588980A
KH134-20_NVY,N18603014A
KH134-24_BGN,N42588981A
KH134-24_BLK,N18603018A
KH134-24_SIL,N18603019A
KH134-24_NVY,N18603017A
KH134-28_BLK,N18603021A
KH134-28_BGN,N42588982A
KH134-28_SIL,N18603022A
KH134-28_NVY,N18603020A
KH134-3_BLK,N18603012A
KH134-3_BGN,N42588983A
KH134-3_SIL,N18603013A
KH134-3_NVY,N18603011A
KH247-20_NVY,N18603058A
KH247-20_BRW,N43895454A
KH247-20_BLK,N18603057A
KH247-20_BGN,N43895466A
KH247-20_BRN,N18603055A
KH247-20_MRN,N18603056A
KH247-20_BLU,N43895460A
KH247-24_BRW,N43895455A
KH247-24_BLK,N18603061A
KH247-24_BRN,N18603059A
KH247-24_MRN,N18603060A
KH247-24_BGN,N43895467A
KH247-24_NVY,N18603062A
KH247-24_BLU,N43895461A
KH247-28_BRW,N43895456A
KH247-28_BLU,N43895462A
KH247-28_BLK,N18603065A
KH247-28_NVY,N18603066A
KH247-28_BRN,N18603063A
KH247-28_BGN,N43895468A
KH247-28_MRN,N18603064A
KH247-3_BRW,N43895458A
KH247-3_BLU,N43895464A
KH247-3_BGN,N43895470A
KH247-32_BLU,N43895463A
KH247-32_BGN,N43895469A
KH247-32_BLK,N18603069A
KH247-32_BRW,N43895457A
KH247-32_NVY,N18603070A
KH247-32_MRN,N18603068A
KH247-32_BRN,N18603067A
KH247-4_BRN,N18603051A
KH247-4_BRW,N43895459A
KH247-4_BLU,N43895465A
KH247-4_BGN,N43895471A
KH247-4_BLK,N18603053A
KH247-4_NVY,N18603054A
KH247-4_MRN,N18603052A