SQLite Forum

select valid lat/lng from columns typed as text
Login

select valid lat/lng from columns typed as text

(1) By punkish on 2021-12-31 18:08:50 [link] [source]

I have a sqlite table like so

sqlite> CREATE TABLE t (lat TEXT, lng TEXT);
sqlite> INSERT INTO t VALUES (-3, 45.04);
sqlite> INSERT INTO t VALUES (-95, -22);
sqlite> INSERT INTO t VALUES ('6°', '52.35°');
sqlite> INSERT INTO t VALUES ('-34.5-6.23', 47.223);
sqlite> INSERT INTO t VALUES (6, 182);
sqlite> INSERT INTO t VALUES (-34.53, 22.45);
sqlite> select * from t;
lat         lng
----------  ------
-3          45.04
-95         -22
6°          52.35°
-34.5-6.23  47.223
6           182
-34.53      22.45

I am finding it difficult to return only those rows that are valid lat/lng, so only entries that look numeric (integer or real) and where Abs(lat) <= 90 AND Abs(lng) <= 180. So, only the first and the last rows in the table above.

SELECT * FROM t WHERE Abs(lat) <= 90 AND Abs(lng) <= 180 doesn't work because sqlite CASTs the values to number.

SELECT * from t WHERE lat GLOB '*[0-9]*' AND lng GLOB '*[0-9]*' doesn't work because it finds rows with digits nested within non-digits

SELECT * FROM t WHERE typeof(lat) = "integer" doesn't work because the columb is typed as "text"

What can I do?

(2) By Gunter Hick (gunter_hick) on 2021-12-31 18:18:00 in reply to 1 [link] [source]

It seems you are attempting to solve a presentation layer problem in the data storage layer. I would suggest making lat and long REAL and adding appropriate check constraints. Deal with input and output formats in the application.

(3) By punkish on 2021-12-31 18:43:22 in reply to 2 [link] [source]

no, I can't really do that as I need to store the wrong values as well. I guess I could make a duplicate column with lat/lng REAL and copy the values into that, but I am still curious as to how I can check in sqlite whether a value is a number or not (irrespective of the column type) without sqlite force-casting it to something it thinks is valid. For example, it converts 6º to 6 (I could live with that except my lat/lng are really decimal degrees) and -34.5-6 to -34.5

(4) By Keith Medcalf (kmedcalf) on 2021-12-31 19:06:32 in reply to 1 [link] [source]

This process is called "sanitizing your inputs" and should be performed by your APPLICATION where it is processing inputs BEFORE storing validated data in the database.

Failure to "sanitize your inputs" is a common beginner mistake.

Failure to sanitize your inputs is the basis of the "Hope and Pray" design methodology that is all the rage amongst the children these days.

select lat, 
       lng 
  from (
        select cast(lat as real) as lat, 
               cast(lng as real) as lng
          from t 
         where lat regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$'
           and lng regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$'
       ) 
 where abs(lat) <= 90 
   and abs(lng) <= 180
;

Note that you need the regexp extension loaded in order for this to work.

sqlite> CREATE TABLE t (lat TEXT, lng TEXT);
sqlite> INSERT INTO t VALUES (-3, 45.04);
sqlite> INSERT INTO t VALUES (-95, -22);
sqlite> INSERT INTO t VALUES ('6°', '52.35°');
sqlite> INSERT INTO t VALUES ('-34.5-6.23', 47.223);
sqlite> INSERT INTO t VALUES (6, 182);
sqlite> INSERT INTO t VALUES (-34.53, 22.45);
sqlite>
sqlite> select lat,
   ...>        lng
   ...>   from (
   ...>         select cast(lat as real) as lat,
   ...>                cast(lng as real) as lng
   ...>           from t
   ...>          where lat regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$'
   ...>            and lng regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$'
   ...>        )
   ...>  where abs(lat) <= 90
   ...>    and abs(lng) <= 180
   ...> ;
┌────────┬───────┐
│  lat   │  lng  │
├────────┼───────┤
│ -3.0   │ 45.04 │
│ -34.53 │ 22.45 │
└────────┴───────┘

(5) By Keith Medcalf (kmedcalf) on 2021-12-31 21:06:01 in reply to 4 [link] [source]

Note that you could do this all at once by using a table definition which would allow you to have your cake and eat it too!

create table t
(
    lat     text,
    lng     text,
    rlat AS (case when lat regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$' then cast(lat as real) else NULL end) stored,
    rlng AS (case when lng regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$' then cast(lng as real) else NULL end) stored,
    valid AS (abs(rlat) <= 90 and abs(rlng) <= 180) stored
);

INSERT INTO t VALUES (-3, 45.04);
INSERT INTO t VALUES (-95, -22);
INSERT INTO t VALUES ('6°', '52.35°');
INSERT INTO t VALUES ('-34.5-6.23', 47.223);
INSERT INTO t VALUES (6, 182);
INSERT INTO t VALUES (-34.53, 22.45);

select * from t;

Which yeilds:

sqlite> select * from t;
┌────────────┬────────┬────────┬────────┬───────┐
│    lat     │  lng   │  rlat  │  rlng  │ valid │
├────────────┼────────┼────────┼────────┼───────┤
│ -3         │ 45.04  │ -3.0   │ 45.04  │ 1     │
│ -95        │ -22    │ -95.0  │ -22.0  │ 0     │
│ 6°         │ 52.35° │        │        │       │
│ -34.5-6.23 │ 47.223 │        │ 47.223 │       │
│ 6          │ 182    │ 6.0    │ 182.0  │ 0     │
│ -34.53     │ 22.45  │ -34.53 │ 22.45  │ 1     │
└────────────┴────────┴────────┴────────┴───────┘
sqlite> select rlat, rlng from t where valid;
┌────────┬───────┐
│  rlat  │ rlng  │
├────────┼───────┤
│ -3.0   │ 45.04 │
│ -34.53 │ 22.45 │
└────────┴───────┘
sqlite>

(6) By Keith Medcalf (kmedcalf) on 2021-12-31 21:06:59 in reply to 5 [link] [source]

Note this requires fixing the regexp.c extension so it is marked as deterministic (which it should be).

(8) By Keith Medcalf (kmedcalf) on 2021-12-31 21:21:56 in reply to 6 [link] [source]

Or even:

create table t
(
    lat     text,
    lng     text,
    valid as (
              (
                   lat regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$'
               and lng regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$'
               and abs(lat) <= 90.0
               and abs(lng) <= 180.0
              ) is true
             ) stored
);

which keeps your data and table mostly the same, just with a column called "valid" which tells you whether the data is valid (or not).

sqlite> create table t
   ...> (
   ...>     lat     text,
   ...>     lng     text,
   ...>     valid as (
   ...>               (
   ...>                    lat regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$'
   ...>                and lng regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$'
   ...>                and abs(lat) <= 90.0
   ...>                and abs(lng) <= 180.0
   ...>               ) is true
   ...>              ) stored
   ...> );
sqlite>
sqlite> INSERT INTO t VALUES (-3, 45.04);
sqlite> INSERT INTO t VALUES (-95, -22);
sqlite> INSERT INTO t VALUES ('6°', '52.35°');
sqlite> INSERT INTO t VALUES ('-34.5-6.23', 47.223);
sqlite> INSERT INTO t VALUES (6, 182);
sqlite> INSERT INTO t VALUES (-34.53, 22.45);
sqlite> select * from t;
┌────────────┬────────┬───────┐
│    lat     │  lng   │ valid │
├────────────┼────────┼───────┤
│ -3         │ 45.04  │ 1     │
│ -95        │ -22    │ 0     │
│ 6°         │ 52.35° │ 0     │
│ -34.5-6.23 │ 47.223 │ 0     │
│ 6          │ 182    │ 0     │
│ -34.53     │ 22.45  │ 1     │
└────────────┴────────┴───────┘
sqlite>

(7) By Keith Medcalf (kmedcalf) on 2021-12-31 21:11:08 in reply to 5 [link] [source]

You might want to make the definition thus to ensure "valid" is never null:

create table t
(
    lat     text,
    lng     text,
    rlat AS (case when lat regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$' then cast(lat as real) else NULL end) stored,
    rlng AS (case when lng regexp '^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$' then cast(lng as real) else NULL end) stored,
    valid AS ((abs(rlat) <= 90 and abs(rlng) <= 180) is true) stored
);

(10) By punkish on 2022-01-01 19:07:06 in reply to 5 [link] [source]

Thanks, this was the most elegant solution. It allows me to keep the original values, wrong or right, and have a STORED flag that tells me if the row is valid or not. Works very well.

(9) By RandomCoder on 2021-12-31 21:48:44 in reply to 1 [source]

If you're not set on coercing all of the values to strings, you can drop the affinity for the columns:

CREATE TABLE t (lat, lng);
INSERT INTO t VALUES (-3, 45.04);
INSERT INTO t VALUES (-95, -22);
INSERT INTO t VALUES ('6°', '52.35°');
INSERT INTO t VALUES ('-34.5-6.23', 47.223);
INSERT INTO t VALUES (6, 182);
INSERT INTO t VALUES (-34.53, 22.45);
SELECT lat, lng FROM t WHERE typeof(lat) in ('real', 'integer') AND typeof(lng) in ('real', 'integer') AND abs(lat) <= 90 AND abs(lng) <= 180;

Though, honestly, if I were tasked with this problem, I'd keep the "pre-sanitized" data in one table (if it must be kept), and the "post-sanitized" subset in another table, perhaps with a link of some sort. To my mind they're two different things, and I'd expect to perfect different actions on them. Along with the possibility of adding custom sanitization logic like turning '-34.5-6.23' into -40.73 or '40°10'30"' to 40.175 when storing it in the sanitized dataset if it made sense for some business logic reason.

Pushing this down to the data storage level is going to hit all sorts off issues, I'm sure.