SQLite Forum

another performance question related to `geopoly`
Login
I am back with a performance question about `geopoly_within` using the `geopoly` extension. First, the schema (showing only the columns relevant to the queries)

```sql

-- ~350K rows
CREATE TABLE treatments (  
    treatmentId TEXT NOT NULL UNIQUE, 
    treatmentTitle TEXT
);

-- ~397008 rows of which ~135K have latitude and longitude values
CREATE TABLE materialsCitations ( 
    materialsCitationId TEXT NOT NULL, 
    treatmentId TEXT NOT NULL, 
    latitude REAL, 
    longitude REAL,
    UNIQUE (materialsCitationId, treatmentId)
);

CREATE VIRTUAL TABLE vtreatments USING FTS5(treatmentId, fullText);
/* vtreatments(treatmentId,fullText) */;

CREATE VIRTUAL TABLE vlocations USING geopoly(treatmentId, materialsCitationId)
/* vlocations(_shape,treatmentId,materialsCitationId) */;
```

Since I only have point data, I have filled the `vlocations` table with really tiny polygons constructed around each point. Now, on to the queries…

First, a fulltext search which is very fast, as expected

```sql
sqlite> .timer on
sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM vtreatments WHERE vtreatments MATCH 'Meshram';
QUERY PLAN
∟--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
Run Time: real 0.000 user 0.000070 sys 0.000010

sqlite> SELECT Count(*) AS num FROM vtreatments WHERE vtreatments MATCH 'Meshram';
35
Run Time: real 0.014 user 0.001111 sys 0.003128
```

Then, given the following `poly`, using a radius of 50 kms around `"lat":25.6532, "lng":3.48`

```js
poly = '[[3.9291576420597614,25.653199999999963],[3.9205272039128367,25.57418480393311],[3.8949675523700438,25.498156902884112],[3.853460930506155,25.428046553409356],[3.797602414522217,25.366560752342917],[3.7295386158616766,25.316077273341982],[3.6518851881364403,25.278551030320266],[3.567626309025444,25.25543670701288],[3.4800000000000004,25.24763096138966],[3.3923736909745563,25.25543670701288],[3.30811481186356,25.278551030320266],[3.230461384138324,25.316077273341982],[3.1623975854777835,25.366560752342917],[3.106539069493846,25.428046553409356],[3.0650324476299566,25.498156902884112],[3.0394727960871637,25.57418480393311],[3.03084235794024,25.653199999999963],[3.0394727960871637,25.73216289757996],[3.0650324476299566,25.808041865571695],[3.106539069493846,25.877929322171752],[3.162397585477784,25.93915220501075],[3.2304613841383247,25.989372767781138],[3.3081148118635615,26.026676122749052],[3.3923736909745577,26.049641517820053],[3.4800000000000013,26.05739496695326],[3.5676263090254454,26.049641517820053],[3.6518851881364416,26.026676122749052],[3.7295386158616783,25.989372767781138],[3.7976024145222183,25.93915220501075],[3.853460930506156,25.877929322171752],[3.894967552370044,25.808041865571695],[3.920527203912837,25.73216289757995],[3.9291576420597614,25.653199999999963]]'
```

a `geopoly_within` search which is not as fast as an FTS search, but still reasonable

```sql
sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM vlocations WHERE geopoly_within(_shape, poly) != 0;
QUERY PLAN
∟--SCAN TABLE vlocations VIRTUAL TABLE INDEX 4:fullscan
Run Time: real 0.000 user 0.000087 sys 0.000019

sqlite> SELECT Count(*) AS num FROM vlocations WHERE geopoly_within(_shape, poly) != 0;
2
Run Time: real 1.606 user 1.527031 sys 0.059115
```

Finally, the really problematic query wherein I combine the above two queries

```sql
sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM treatments JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId JOIN vlocations ON treatments.treatmentId = vlocations.treatmentId WHERE vtreatments MATCH 'Meshram' AND geopoly_within(_shape, poly) != 0;
QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
∟--SCAN TABLE vlocations VIRTUAL TABLE INDEX 4:fullscan
Run Time: real 0.002 user 0.000289 sys 0.001414

sqlite> SELECT Count(*) AS num FROM treatments JOIN vtreatments ON treatments.treatmentId
= vtreatments.treatmentId JOIN vlocations ON treatments.treatmentId = vlocations.treatmentId WHERE vtreatments MATCH 'Meshram' AND geopoly_within(_shape, poly) != 0;
2
Run Time: real 46.382 user 44.862766 sys 1.394483
```

For comparison, below I am using FTS5 but not the VIRTUAL vlocations table

```sql
sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude = 25.6532 AND longitude = 3.48;
QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
∟--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_treatmentId (ANY(deleted) AND treatmentId=?)
Run Time: real 0.001 user 0.000278 sys 0.000701

sqlite> SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude = 25.6532 AND longitude = 3.48;
2
Run Time: real 0.047 user 0.002550 sys 0.012607
```

And finally, the same query as above except that I am searching inside a box to simulate a `geopoly_within` query

```sql
sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude BETWEEN 25.6530 AND 25.6534 AND longitude BETWEEN 3.47 AND 3.49;
QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
∟--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_treatmentId (ANY(deleted) AND treatmentId=?)
Run Time: real 0.000 user 0.000237 sys 0.000185

sqlite> SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude BETWEEN 25.6530 AND 25.6534 AND longitude BETWEEN 3.47 AND 3.49;
2
Run Time: real 0.002 user 0.001776 sys 0.000296
```

Summary: the VIRTUAL locations queries are a few orders of magnitude slower than FTS5 queries, but combining VIRTUAL location with VIRTUAL text really slows down to a crawl. Seems like I am better off completely abandoning RTree index and `geopoly`. I am using the latter *only* because it enables me to use `geopoly_within`, thereby allowing me to do radius searches.

Question: Am I doing something wrong? Can I improve the performance of `geopoly`, esp when joining a `geopoly` table to other tables?