SQLite Forum

[SELECT] Simple way to search for data that start with given pattern?
Login

[SELECT] Simple way to search for data that start with given pattern?

(1) By Gilles on 2020-11-30 20:46:59 [link] [source]

Hello,

I need to find all the records where the ZIP code starts with such and such digits.

Apparently, SQLite doesn't support regexes ("Error: no such function: REGEXP"), so that won't do:
SELECT COUNT(*) FROM MyTable WHERE zip REGEXP '^(01|04|54)';
SELECT COUNT(*) FROM MyTable WHERE zip REGEXP '^01|^04|^54');

Is there something easier than a long list of LIKE?
SELECT COUNT(*) FROM MyTable WHERE zip LIKE "01%" OR LIKE "04%" etc.

Thank you.

(2) By Keith Medcalf (kmedcalf) on 2020-12-01 01:21:40 in reply to 1 [link] [source]

You can indeed just load the REGEXP function. One such implementation of the regexp function can be found here:

https://www.sqlite.org/src/file?name=ext/misc/regexp.c&ci=tip

(3) By Keith Medcalf (kmedcalf) on 2020-12-01 01:24:31 in reply to 1 [source]

And your syntax is incorrect.

SELECT COUNT(*) FROM MyTable WHERE zip LIKE "01%" OR LIKE "04%" etc.

should be

select count(*) from mytable where zip like '01%' or zip like '04%' etc.

double quotes are for identifiers. single quotes are for strings.

Unlike English (or COBOL) you cannot elide the subject of the likeness.

(4) By Ryan Smith (cuz) on 2020-12-01 10:24:01 in reply to 1 [link] [source]

This is not really answering the question about "something easier" to achieve yoru goal (because as Keith pointed out, there is no "something easier", though you can load the REGEX function itself and use that).

However, in the specific query you are trying to achieve, the most concise way to put it (barring loading the REGEX function), would be:

SELECT COUNT(*) FROM MyTable WHERE SUBSTR(zip,1,2) IN ('01','04','54');

The problem with this approach (and the REGEX approach, unless there's an optimization I'm unaware of) is that they will be slower (assuming the "zip" column is indexed) than "... LIKE '01%' OR LIKE '04%' OR ..." because LIKE has an optimization whereby, if the searched-for entity is of the form "xxx%" (as opposed to '%xxx' or '%xx%') then it can use the Index, resulting in a much faster lookup.

(5) By Keith Medcalf (kmedcalf) on 2020-12-01 10:37:01 in reply to 4 [link] [source]

Presumably if one were to create an index on the expression SUBSTR(zip,1,2) then the planner would put the temporary table for the IN list in the outer loop and descend into MyTable using that index.

LIKE would, of course, only use a case-insensitive index on zip unless it were told that like were case_sensitive -- the fact that zip contains numbers only is quite irrelevant to whether a text field is case sensitive or not.

(6) By anonymous on 2020-12-01 10:46:48 in reply to 1 [link] [source]

Maybe incorporate GLOB() in some way

(7) By Simon Slavin (slavin) on 2020-12-01 12:12:03 in reply to 1 [link] [source]

There are a number of different ways this problem can be solved, but they each have costs: programming time, storage space, etc.. To help the people trying to give you an answer suited to your application, can you tell us

(A) How many rows you have in MyTable ? Is it more like thousands or millions or billions ?
(B) Is it important in your app that answers can be returned in less than a second, or is a delay of a couple of seconds acceptable, or is a longer delay acceptable ?