I am back with a performance question about `geopoly_within` using the `geopoly` extension. First, the schema (showing only the columns relevant to the queries) ```sql -- ~350K rows CREATE TABLE treatments ( treatmentId TEXT NOT NULL UNIQUE, treatmentTitle TEXT ); -- ~397008 rows of which ~135K have latitude and longitude values CREATE TABLE materialsCitations ( materialsCitationId TEXT NOT NULL, treatmentId TEXT NOT NULL, latitude REAL, longitude REAL, UNIQUE (materialsCitationId, treatmentId) ); CREATE VIRTUAL TABLE vtreatments USING FTS5(treatmentId, fullText); /* vtreatments(treatmentId,fullText) */; CREATE VIRTUAL TABLE vlocations USING geopoly(treatmentId, materialsCitationId) /* vlocations(_shape,treatmentId,materialsCitationId) */; ``` Since I only have point data, I have filled the `vlocations` table with really tiny polygons constructed around each point. Now, on to the queries… First, a fulltext search which is very fast, as expected ```sql sqlite> .timer on sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM vtreatments WHERE vtreatments MATCH 'Meshram'; QUERY PLAN ∟--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m Run Time: real 0.000 user 0.000070 sys 0.000010 sqlite> SELECT Count(*) AS num FROM vtreatments WHERE vtreatments MATCH 'Meshram'; 35 Run Time: real 0.014 user 0.001111 sys 0.003128 ``` Then, given the following `poly`, using a radius of 50 kms around `"lat":25.6532, "lng":3.48` ```js poly = '[[3.9291576420597614,25.653199999999963],[3.9205272039128367,25.57418480393311],[3.8949675523700438,25.498156902884112],[3.853460930506155,25.428046553409356],[3.797602414522217,25.366560752342917],[3.7295386158616766,25.316077273341982],[3.6518851881364403,25.278551030320266],[3.567626309025444,25.25543670701288],[3.4800000000000004,25.24763096138966],[3.3923736909745563,25.25543670701288],[3.30811481186356,25.278551030320266],[3.230461384138324,25.316077273341982],[3.1623975854777835,25.366560752342917],[3.106539069493846,25.428046553409356],[3.0650324476299566,25.498156902884112],[3.0394727960871637,25.57418480393311],[3.03084235794024,25.653199999999963],[3.0394727960871637,25.73216289757996],[3.0650324476299566,25.808041865571695],[3.106539069493846,25.877929322171752],[3.162397585477784,25.93915220501075],[3.2304613841383247,25.989372767781138],[3.3081148118635615,26.026676122749052],[3.3923736909745577,26.049641517820053],[3.4800000000000013,26.05739496695326],[3.5676263090254454,26.049641517820053],[3.6518851881364416,26.026676122749052],[3.7295386158616783,25.989372767781138],[3.7976024145222183,25.93915220501075],[3.853460930506156,25.877929322171752],[3.894967552370044,25.808041865571695],[3.920527203912837,25.73216289757995],[3.9291576420597614,25.653199999999963]]' ``` a `geopoly_within` search which is not as fast as an FTS search, but still reasonable ```sql sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM vlocations WHERE geopoly_within(_shape, poly) != 0; QUERY PLAN ∟--SCAN TABLE vlocations VIRTUAL TABLE INDEX 4:fullscan Run Time: real 0.000 user 0.000087 sys 0.000019 sqlite> SELECT Count(*) AS num FROM vlocations WHERE geopoly_within(_shape, poly) != 0; 2 Run Time: real 1.606 user 1.527031 sys 0.059115 ``` Finally, the really problematic query wherein I combine the above two queries ```sql sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM treatments JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId JOIN vlocations ON treatments.treatmentId = vlocations.treatmentId WHERE vtreatments MATCH 'Meshram' AND geopoly_within(_shape, poly) != 0; QUERY PLAN |--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m |--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?) ∟--SCAN TABLE vlocations VIRTUAL TABLE INDEX 4:fullscan Run Time: real 0.002 user 0.000289 sys 0.001414 sqlite> SELECT Count(*) AS num FROM treatments JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId JOIN vlocations ON treatments.treatmentId = vlocations.treatmentId WHERE vtreatments MATCH 'Meshram' AND geopoly_within(_shape, poly) != 0; 2 Run Time: real 46.382 user 44.862766 sys 1.394483 ``` For comparison, below I am using FTS5 but not the VIRTUAL vlocations table ```sql sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude = 25.6532 AND longitude = 3.48; QUERY PLAN |--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m |--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?) ∟--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_treatmentId (ANY(deleted) AND treatmentId=?) Run Time: real 0.001 user 0.000278 sys 0.000701 sqlite> SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude = 25.6532 AND longitude = 3.48; 2 Run Time: real 0.047 user 0.002550 sys 0.012607 ``` And finally, the same query as above except that I am searching inside a box to simulate a `geopoly_within` query ```sql sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude BETWEEN 25.6530 AND 25.6534 AND longitude BETWEEN 3.47 AND 3.49; QUERY PLAN |--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m |--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?) ∟--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_treatmentId (ANY(deleted) AND treatmentId=?) Run Time: real 0.000 user 0.000237 sys 0.000185 sqlite> SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude BETWEEN 25.6530 AND 25.6534 AND longitude BETWEEN 3.47 AND 3.49; 2 Run Time: real 0.002 user 0.001776 sys 0.000296 ``` Summary: the VIRTUAL locations queries are a few orders of magnitude slower than FTS5 queries, but combining VIRTUAL location with VIRTUAL text really slows down to a crawl. Seems like I am better off completely abandoning RTree index and `geopoly`. I am using the latter *only* because it enables me to use `geopoly_within`, thereby allowing me to do radius searches. Question: Am I doing something wrong? Can I improve the performance of `geopoly`, esp when joining a `geopoly` table to other tables?