SQLite User Forum

finding rows that contain a non-numeric character
Login

finding rows that contain a non-numeric character

(1) By punkish on 2022-10-14 19:46:26 [link] [source]

lacking a regexp extension compiled in my version of sqlite3, how can I SELECT foo FROM table WHERE Instr(foo, /[a-zA-Z]/) > 0 (or something similar). That is, identify all rows that might contain any non-numeric character?

(2) By Michael A. Cleverly (cleverly) on 2022-10-14 20:13:50 in reply to 1 [link] [source]

You could check to make sure length(foo) > 0 then use the replace() function to remove all digits 0-9.

If foo only contained digits then the length() after replacement will be zero, so if the length() isn't zero there must be non-digits present.

Something like:

SELECT foo
  FROM tbl
 WHERE length(foo) > 0
   AND length(replace(replace(replace(
              replace(replace(replace(
              replace(replace(replace(replace(foo,
       '9', ''), '8', ''), '7', ''),
       '6', ''), '5', ''), '4', ''),
       '3', ''), '2', ''), '1', ''), '0', '')) > 0;

(3) By punkish on 2022-10-14 20:20:17 in reply to 2 [link] [source]

Thank you, I would have preferred to replace that with something simpler, but it is indeed very cleverly done, ahem. It works well.

(5) By Michael A. Cleverly (cleverly) on 2022-10-14 20:31:08 in reply to 3 [link] [source]

Actually, on second thought, we can replace all those replace() with a trim():

SELECT foo
  FROM tbl
 WHERE length(foo) > 0
   AND length(trim(foo, '0123456789')) > 0;

(7) By punkish on 2022-10-14 20:35:28 in reply to 5 [link] [source]

that would not work because Trim() removes the specified chars from either end of the string. Here is an example of a fail

> SELECT Trim('098ahif792hsk910', '0123456789');
ahif792hsk

(8) By Michael A. Cleverly (cleverly) on 2022-10-14 21:44:49 in reply to 7 [source]

Exactly. Since the result, ahif792hsk has a positive length you know that the original string must have contained non-numeric characters because length('ahif792hsk') > 0.

If it had only contained numeric characters, after trimming all digits, the length would have been 0.

Now, if you need to know the precise number of non-digits, then the trim() won't work (but the nested replace() would).

(9) By anonymous on 2022-10-14 21:55:34 in reply to 5 [link] [source]

The built-in glob operator understands negated character classes:

   select foo from tbl
       where foo glob '*[^0-9]*';

(10) By punkish on 2022-10-15 16:12:20 in reply to 9 [link] [source]

this is wonderful, thanks! I didn't know about glob and now it seems I can do much with it that I would with a regexp extension

(4) By Chris Locke (chrisjlocke1) on 2022-10-14 20:29:33 in reply to 2 [link] [source]

I had something similar to this at work when perusing a computed column. Someone wanted to sentence case station names, so used this method. Replacing various connotations of alphabetic characters resulted in about 78 of these replace statements. It wasn't pretty ... it doesn't scale well.

(6.1) By punkish on 2022-10-14 20:53:56 edited from 6.0 in reply to 4 [link] [source]

it would be nice if there were a factory-supplied regexp capability, maybe via a compile time option, just like json, rtree, fts5, etc.