geopoly performance

(1.1) By punkish on 2020-08-17 07:22:32 edited from 1.0 [source]

I am back with a question about geopoly and performance. I have tables like so

CREATE TABLE a (a_id);
CREATE TABLE b (b_id, a_id, lat, lon);

sqlite> SELECT Count(*) FROM a;
370313
sqlite> SELECT Count(*) FROM b;
397008
sqlite> SELECT Count(*) FROM b WHERE lon != '' AND lat != '';
134681

I create a geopoly table and load it with tiny triangles generated around each point with a delta of 0.0001 degree (from what I understand, geopoly and r*tree tables can't deal with points… they need polys)

CREATE VIRTUAL TABLE vloc USING geopoly(a_id, b_id);
INSERT INTO vloc (a_id, b_id, _shape) SELECT a.a_id, b_id, geopoly_regular(lon, lat, 0.0001, 3) AS _shape FROM a JOIN b ON a.a_id = b.a_id WHERE lon != '' AND lat != '';
sqlite> SELECT Count(*) FROM vloc;
134681

I want to be able to find all rows in table a within say, 10 kms of a given lon, lat. I use the following two queries as test, and a radius of 0.1 (1 deg is ~111 kms near the equator, so I am just using 0.1 deg as an approximation here)

sqlite> SELECT DISTINCT a_id FROM vloc WHERE geopoly_within(_shape, geopoly_regular(0, 0, 0.1, 4)) != 0;
(… 8 rows are returned …)
Run Time: real 0.400 user 0.348746 sys 0.048510
sqlite> SELECT DISTINCT a.a_id FROM a JOIN b ON a.a_id = b.a_id WHERE lat BETWEEN -0.1 AND 0.1 AND lon BETWEEN -0.1 AND 0.1;
(… 9 rows are returned …)
Run Time: real 0.027 user 0.001259 sys 0.006363
sqlite>

The results are not going to be exact because the queries are ever so different, but they are close enough for me. However, using geopoly takes 14 times longer than just doing a simple BETWEEN over a JOIN.

The reason I want to use geopoly is that I can use other geospatial libraries to convert from lat,lon to cartesian coordinates and allow more natural queries such as "all the rows within x kms of a given point", but some of the results are two or even three orders of magnitude slower with geopoly.

What am I doing wrong? How can I improve the speed of geopoly?

Update: In my application, I use geopoly_within(_shape, @poly) where I computer the value for poly using a geospatial library so I can use a radius in kilometers. The SELECT time is even slower when using a circle. In fact, even with geopoly_within(_shape, geopoly_regular(0, 0, 0.1, 20)) in the above query, the performance degrades from 0.400 to 0.636