geopoly performance

### (1.1) By punkish on 2020-08-17 07:22:32 edited from 1.0 [source]

I am back with a question about `geopoly` and performance. I have tables like so

``````CREATE TABLE a (a_id);
CREATE TABLE b (b_id, a_id, lat, lon);

sqlite> SELECT Count(*) FROM a;
370313
sqlite> SELECT Count(*) FROM b;
397008
sqlite> SELECT Count(*) FROM b WHERE lon != '' AND lat != '';
134681
``````

I create a `geopoly` table and load it with tiny triangles generated around each point with a delta of 0.0001 degree (from what I understand, `geopoly` and `r*tree` tables can't deal with points… they need polys)

``````CREATE VIRTUAL TABLE vloc USING geopoly(a_id, b_id);
INSERT INTO vloc (a_id, b_id, _shape) SELECT a.a_id, b_id, geopoly_regular(lon, lat, 0.0001, 3) AS _shape FROM a JOIN b ON a.a_id = b.a_id WHERE lon != '' AND lat != '';
sqlite> SELECT Count(*) FROM vloc;
134681
``````

I want to be able to find all rows in table `a` within say, 10 kms of a given `lon`, `lat`. I use the following two queries as test, and a `radius` of `0.1` (1 deg is ~111 kms near the equator, so I am just using 0.1 deg as an approximation here)

``````sqlite> SELECT DISTINCT a_id FROM vloc WHERE geopoly_within(_shape, geopoly_regular(0, 0, 0.1, 4)) != 0;
(… 8 rows are returned …)
Run Time: real 0.400 user 0.348746 sys 0.048510
sqlite> SELECT DISTINCT a.a_id FROM a JOIN b ON a.a_id = b.a_id WHERE lat BETWEEN -0.1 AND 0.1 AND lon BETWEEN -0.1 AND 0.1;
(… 9 rows are returned …)
Run Time: real 0.027 user 0.001259 sys 0.006363
sqlite>
``````

The results are not going to be exact because the queries are ever so different, but they are close enough for me. However, using `geopoly` takes 14 times longer than just doing a simple `BETWEEN` over a `JOIN`.

The reason I want to use `geopoly` is that I can use other geospatial libraries to convert from lat,lon to cartesian coordinates and allow more natural queries such as "all the rows within `x` kms of a given point", but some of the results are two or even three orders of magnitude slower with `geopoly`.

What am I doing wrong? How can I improve the speed of `geopoly`?

Update: In my application, I use `geopoly_within(_shape, @poly)` where I computer the value for `poly` using a geospatial library so I can use a radius in kilometers. The `SELECT` time is even slower when using a circle. In fact, even with `geopoly_within(_shape, geopoly_regular(0, 0, 0.1, 20))` in the above query, the performance degrades from 0.400 to 0.636