SQLite Forum

Timeline
Login

40 forum posts by user punkish

2022-01-01
19:07 Reply: select valid lat/lng from columns typed as text (artifact: 9706557aae user: punkish)

Thanks, this was the most elegant solution. It allows me to keep the original values, wrong or right, and have a STORED flag that tells me if the row is valid or not. Works very well.

2021-12-31
18:43 Reply: select valid lat/lng from columns typed as text (artifact: d59530c897 user: punkish)

no, I can't really do that as I need to store the wrong values as well. I guess I could make a duplicate column with lat/lng REAL and copy the values into that, but I am still curious as to how I can check in sqlite whether a value is a number or not (irrespective of the column type) without sqlite force-casting it to something it thinks is valid. For example, it converts 6º to 6 (I could live with that except my lat/lng are really decimal degrees) and -34.5-6 to -34.5

18:08 Post: select valid lat/lng from columns typed as text (artifact: 3ab3e7ce80 user: punkish)

I have a sqlite table like so

sqlite> CREATE TABLE t (lat TEXT, lng TEXT);
sqlite> INSERT INTO t VALUES (-3, 45.04);
sqlite> INSERT INTO t VALUES (-95, -22);
sqlite> INSERT INTO t VALUES ('6°', '52.35°');
sqlite> INSERT INTO t VALUES ('-34.5-6.23', 47.223);
sqlite> INSERT INTO t VALUES (6, 182);
sqlite> INSERT INTO t VALUES (-34.53, 22.45);
sqlite> select * from t;
lat         lng
----------  ------
-3          45.04
-95         -22
6°          52.35°
-34.5-6.23  47.223
6           182
-34.53      22.45

I am finding it difficult to return only those rows that are valid lat/lng, so only entries that look numeric (integer or real) and where Abs(lat) <= 90 AND Abs(lng) <= 180. So, only the first and the last rows in the table above.

SELECT * FROM t WHERE Abs(lat) <= 90 AND Abs(lng) <= 180 doesn't work because sqlite CASTs the values to number.

SELECT * from t WHERE lat GLOB '*[0-9]*' AND lng GLOB '*[0-9]*' doesn't work because it finds rows with digits nested within non-digits

SELECT * FROM t WHERE typeof(lat) = "integer" doesn't work because the columb is typed as "text"

What can I do?

2021-12-26
12:12 Edit: emulating UPSERT on VIRTUAL table (artifact: 25e767ebd1 user: punkish)

I realize that UPSERTs are not supported on VIRTUAL tables (and neither are UNIQUE constraints). Nonetheless, I want to avoid duplicate fooid and want to update a row if it already exists or insert if it doesn't. Consider

CREATE VIRTUAL TABLE IF NOT EXISTS vfoo USING FTS5(fooid, footext)

Of course, I could check for the existence of a row, and then do an INSERT or an UPDATE as required, but is there a more elegant, one-step way to emulate an UPSERT on a virtual table (fts5 as well as rtree and geopoly)?

11:52 Edit: emulating UPSERT on VIRTUAL table (artifact: 641ed45e11 user: punkish)

I realize that UPSERTs are not supported on VIRTUAL tables (and neither are UNIQUE constraints). Nonetheless, I want to avoid duplicates and want to update a row if it already exists or insert if it doesn't. Consider

CREATE VIRTUAL TABLE IF NOT EXISTS vfoo USING FTS5(fooid, footext)

Of course, I could check for the existence of a row, and then do an INSERT or an UPDATE as required, but is there a more elegant, one-step way to emulate an UPSERT on a virtual table (fts5 as well as rtree and geopoly)?

11:51 Post: emulating UPSERT on VIRTUAL table (artifact: f3f73221cc user: punkish)

I realize that UPSERTs are not supported on VIRTUAL tables. Nonetheless, I want to avoid duplicates and want to update a row if it already exists or insert if it doesn't. Consider

CREATE VIRTUAL TABLE IF NOT EXISTS vfoo USING FTS5(fooid, footext)

Of course, I could check for the existence of a row, and then do an INSERT or an UPDATE as required, but is there a more elegant, one-step way to emulate an UPSERT on a virtual table (fts5 as well as rtree and geopoly)?

2020-09-15
14:29 Reply: another performance question related to `geopoly` (artifact: 25d2c5c5b4 user: punkish)

the reason I was using geopoly was because I wanted to use the geopoly_within function which, as the docs imply, is supposed to be very fast because it uses the RTree indexes. I must say, this is the first instance of being disappointed at SQLite's performance. I am hardly better than an intermediate user, but I have been using SQLite forever. And I don't ever recall throwing anything at it and not being amazed at how wonderfully it performs. Until now, with geopoly. Given the additional overhead of having to create and maintain a VIRTUAL locations table, it hardly seems worth the effort given its lackluster performance.

In my case, using RTree alone makes no sense as I can get very fast (though slightly less precise) results doing just a naive bbox query using min and max lats and lngs.

And no, Spatialite is not an option. I would much rather just jump to PostGIS if it ever came to that. Though I seriously hope it won't. I do love the simplicity and agility of SQLite too much to give it up (easily).

14:23 Reply: another performance question related to `geopoly` (artifact: dfc84c2126 user: punkish)

Thanks Keith. I tried your suggestion, and yes, it is much better, much more performant. It is as fast as doing a simple geopoly query in the examples in my OP. Here it is

SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
sqlite> .eqp on
sqlite> .timer on
sqlite> SELECT Count(*) FROM treatments WHERE treatmentId IN (SELECT treatmentid FROM vtreatments WHERE vtreatments MATCH 'meshram' INTERSECT SELECT treatmentid FROM vlocations WHERE geopoly_within(_shape, '[[3.9291576420597614,25.653199999999963],[3.9205272039128367,25.57418480393311],[3.8949675523700438,25.498156902884112],[3.853460930506155,25.428046553409356],[3.797602414522217,25.366560752342917],[3.7295386158616766,25.316077273341982],[3.6518851881364403,25.278551030320266],[3.567626309025444,25.25543670701288],[3.4800000000000004,25.24763096138966],[3.3923736909745563,25.25543670701288],[3.30811481186356,25.278551030320266],[3.230461384138324,25.316077273341982],[3.1623975854777835,25.366560752342917],[3.106539069493846,25.428046553409356],[3.0650324476299566,25.498156902884112],[3.0394727960871637,25.57418480393311],[3.03084235794024,25.653199999999963],[3.0394727960871637,25.73216289757996],[3.0650324476299566,25.808041865571695],[3.106539069493846,25.877929322171752],[3.162397585477784,25.93915220501075],[3.2304613841383247,25.989372767781138],[3.3081148118635615,26.026676122749052],[3.3923736909745577,26.049641517820053],[3.4800000000000013,26.05739496695326],[3.5676263090254454,26.049641517820053],[3.6518851881364416,26.026676122749052],[3.7295386158616783,25.989372767781138],[3.7976024145222183,25.93915220501075],[3.853460930506156,25.877929322171752],[3.894967552370044,25.808041865571695],[3.920527203912837,25.73216289757995],[3.9291576420597614,25.653199999999963]]') != 0);
QUERY PLAN
|--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
`--LIST SUBQUERY 2
   `--COMPOUND QUERY
      |--LEFT-MOST SUBQUERY
      |  `--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
      `--INTERSECT USING TEMP B-TREE
         `--SCAN TABLE vlocations VIRTUAL TABLE INDEX 4:fullscan
1
Run Time: real 1.813 user 1.594897 sys 0.087140

It also give a better result because of the tighter selection (I get a count = 1 as opposed to a count = 2 for my bbox query where I search for latitude BETWEEN min_lat AND max_lat AND longitude BETWEEN min_lng AND max_lng. But the bbox query is much faster. This INTERSECT approach is also much more complicated for me to construct programmatically. So, I think I will stick to my bbox methodology. But how on earth would I have discovered this if you hadn't been around? I ask this seriously in a quest to understand what is going on?

And why are RTree queries (geopoly uses RTree in the background) so slow as compared to, say, FTS5 which is like magic.

09:24 Reply: another performance question related to `geopoly` (artifact: 29afe82b4e user: punkish)

I can't figure out why my geopoly code is slow and how to make it fast. It also seems that either I am asking the question in a wrong way or there is not much interest or insight in the workings of geopoly (RTree) on this forum since three of my geopoly related questions have gone without a reply. Consequently, I have decided to stop using geopoly esp since a standard BETWEEN clause seems to work very well (as detailed in my last example above).

In case someone else stumbles upon this with a similar problem – I am using turf.js library to create a bbox around my point with the provided radius (actually I first create a buffer and then create a bbox for the buffer). That gives me the min and max lat and lng which I use in my latitude BETWEEN min_lat AND max_lat and longitude BETWEEN min_lng AND max_lng SQL. Hope this is helpful.

04:20 Post: another performance question related to `geopoly` (artifact: 28ede1c8f3 user: punkish)

I am back with a performance question about geopoly_within using the geopoly extension. First, the schema (showing only the columns relevant to the queries)


-- ~350K rows
CREATE TABLE treatments (  
    treatmentId TEXT NOT NULL UNIQUE, 
    treatmentTitle TEXT
);

-- ~397008 rows of which ~135K have latitude and longitude values
CREATE TABLE materialsCitations ( 
    materialsCitationId TEXT NOT NULL, 
    treatmentId TEXT NOT NULL, 
    latitude REAL, 
    longitude REAL,
    UNIQUE (materialsCitationId, treatmentId)
);

CREATE VIRTUAL TABLE vtreatments USING FTS5(treatmentId, fullText);
/* vtreatments(treatmentId,fullText) */;

CREATE VIRTUAL TABLE vlocations USING geopoly(treatmentId, materialsCitationId)
/* vlocations(_shape,treatmentId,materialsCitationId) */;

Since I only have point data, I have filled the vlocations table with really tiny polygons constructed around each point. Now, on to the queries…

First, a fulltext search which is very fast, as expected

sqlite> .timer on
sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM vtreatments WHERE vtreatments MATCH 'Meshram';
QUERY PLAN
∟--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
Run Time: real 0.000 user 0.000070 sys 0.000010

sqlite> SELECT Count(*) AS num FROM vtreatments WHERE vtreatments MATCH 'Meshram';
35
Run Time: real 0.014 user 0.001111 sys 0.003128

Then, given the following poly, using a radius of 50 kms around "lat":25.6532, "lng":3.48

poly = '[[3.9291576420597614,25.653199999999963],[3.9205272039128367,25.57418480393311],[3.8949675523700438,25.498156902884112],[3.853460930506155,25.428046553409356],[3.797602414522217,25.366560752342917],[3.7295386158616766,25.316077273341982],[3.6518851881364403,25.278551030320266],[3.567626309025444,25.25543670701288],[3.4800000000000004,25.24763096138966],[3.3923736909745563,25.25543670701288],[3.30811481186356,25.278551030320266],[3.230461384138324,25.316077273341982],[3.1623975854777835,25.366560752342917],[3.106539069493846,25.428046553409356],[3.0650324476299566,25.498156902884112],[3.0394727960871637,25.57418480393311],[3.03084235794024,25.653199999999963],[3.0394727960871637,25.73216289757996],[3.0650324476299566,25.808041865571695],[3.106539069493846,25.877929322171752],[3.162397585477784,25.93915220501075],[3.2304613841383247,25.989372767781138],[3.3081148118635615,26.026676122749052],[3.3923736909745577,26.049641517820053],[3.4800000000000013,26.05739496695326],[3.5676263090254454,26.049641517820053],[3.6518851881364416,26.026676122749052],[3.7295386158616783,25.989372767781138],[3.7976024145222183,25.93915220501075],[3.853460930506156,25.877929322171752],[3.894967552370044,25.808041865571695],[3.920527203912837,25.73216289757995],[3.9291576420597614,25.653199999999963]]'

a geopoly_within search which is not as fast as an FTS search, but still reasonable

sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM vlocations WHERE geopoly_within(_shape, poly) != 0;
QUERY PLAN
∟--SCAN TABLE vlocations VIRTUAL TABLE INDEX 4:fullscan
Run Time: real 0.000 user 0.000087 sys 0.000019

sqlite> SELECT Count(*) AS num FROM vlocations WHERE geopoly_within(_shape, poly) != 0;
2
Run Time: real 1.606 user 1.527031 sys 0.059115

Finally, the really problematic query wherein I combine the above two queries

sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM treatments JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId JOIN vlocations ON treatments.treatmentId = vlocations.treatmentId WHERE vtreatments MATCH 'Meshram' AND geopoly_within(_shape, poly) != 0;
QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
∟--SCAN TABLE vlocations VIRTUAL TABLE INDEX 4:fullscan
Run Time: real 0.002 user 0.000289 sys 0.001414

sqlite> SELECT Count(*) AS num FROM treatments JOIN vtreatments ON treatments.treatmentId
= vtreatments.treatmentId JOIN vlocations ON treatments.treatmentId = vlocations.treatmentId WHERE vtreatments MATCH 'Meshram' AND geopoly_within(_shape, poly) != 0;
2
Run Time: real 46.382 user 44.862766 sys 1.394483

For comparison, below I am using FTS5 but not the VIRTUAL vlocations table

sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude = 25.6532 AND longitude = 3.48;
QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
∟--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_treatmentId (ANY(deleted) AND treatmentId=?)
Run Time: real 0.001 user 0.000278 sys 0.000701

sqlite> SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude = 25.6532 AND longitude = 3.48;
2
Run Time: real 0.047 user 0.002550 sys 0.012607

And finally, the same query as above except that I am searching inside a box to simulate a geopoly_within query

sqlite> EXPLAIN QUERY PLAN SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude BETWEEN 25.6530 AND 25.6534 AND longitude BETWEEN 3.47 AND 3.49;
QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING COVERING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
∟--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_treatmentId (ANY(deleted) AND treatmentId=?)
Run Time: real 0.000 user 0.000237 sys 0.000185

sqlite> SELECT Count(*) AS num FROM treatments JOIN materialsCitations ON treatments.treatmentId = materialsCitations.treatmentId JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId WHERE vtreatments MATCH 'meshram' AND latitude BETWEEN 25.6530 AND 25.6534 AND longitude BETWEEN 3.47 AND 3.49;
2
Run Time: real 0.002 user 0.001776 sys 0.000296

Summary: the VIRTUAL locations queries are a few orders of magnitude slower than FTS5 queries, but combining VIRTUAL location with VIRTUAL text really slows down to a crawl. Seems like I am better off completely abandoning RTree index and geopoly. I am using the latter only because it enables me to use geopoly_within, thereby allowing me to do radius searches.

Question: Am I doing something wrong? Can I improve the performance of geopoly, esp when joining a geopoly table to other tables?

2020-08-17
07:22 Edit: geopoly performance (artifact: 3339cc99b2 user: punkish)

I am back with a question about geopoly and performance. I have tables like so

CREATE TABLE a (a_id);
CREATE TABLE b (b_id, a_id, lat, lon);

sqlite> SELECT Count(*) FROM a;
370313
sqlite> SELECT Count(*) FROM b;
397008
sqlite> SELECT Count(*) FROM b WHERE lon != '' AND lat != '';
134681

I create a geopoly table and load it with tiny triangles generated around each point with a delta of 0.0001 degree (from what I understand, geopoly and r*tree tables can't deal with points… they need polys)

CREATE VIRTUAL TABLE vloc USING geopoly(a_id, b_id);
INSERT INTO vloc (a_id, b_id, _shape) SELECT a.a_id, b_id, geopoly_regular(lon, lat, 0.0001, 3) AS _shape FROM a JOIN b ON a.a_id = b.a_id WHERE lon != '' AND lat != '';
sqlite> SELECT Count(*) FROM vloc;
134681

I want to be able to find all rows in table a within say, 10 kms of a given lon, lat. I use the following two queries as test, and a radius of 0.1 (1 deg is ~111 kms near the equator, so I am just using 0.1 deg as an approximation here)

sqlite> SELECT DISTINCT a_id FROM vloc WHERE geopoly_within(_shape, geopoly_regular(0, 0, 0.1, 4)) != 0;
(… 8 rows are returned …)
Run Time: real 0.400 user 0.348746 sys 0.048510
sqlite> SELECT DISTINCT a.a_id FROM a JOIN b ON a.a_id = b.a_id WHERE lat BETWEEN -0.1 AND 0.1 AND lon BETWEEN -0.1 AND 0.1;
(… 9 rows are returned …)
Run Time: real 0.027 user 0.001259 sys 0.006363
sqlite>

The results are not going to be exact because the queries are ever so different, but they are close enough for me. However, using geopoly takes 14 times longer than just doing a simple BETWEEN over a JOIN.

The reason I want to use geopoly is that I can use other geospatial libraries to convert from lat,lon to cartesian coordinates and allow more natural queries such as "all the rows within x kms of a given point", but some of the results are two or even three orders of magnitude slower with geopoly.

What am I doing wrong? How can I improve the speed of geopoly?

Update: In my application, I use geopoly_within(_shape, @poly) where I computer the value for poly using a geospatial library so I can use a radius in kilometers. The SELECT time is even slower when using a circle. In fact, even with geopoly_within(_shape, geopoly_regular(0, 0, 0.1, 20)) in the above query, the performance degrades from 0.400 to 0.636

07:00 Post: geopoly performance (artifact: 6cde2e98eb user: punkish)

I am back with a question about geopoly and performance. I have tables like so

CREATE TABLE a (a_id);
CREATE TABLE b (b_id, a_id, lat, lon);

sqlite> SELECT Count(*) FROM a;
370313
sqlite> SELECT Count(*) FROM b;
397008
sqlite> SELECT Count(*) FROM b WHERE lon != '' AND lat != '';
134681

I create a geopoly table and load it with tiny triangles generated around each point with a delta of 0.0001 degree (from what I understand, geopoly and r*tree tables can't deal with points… they need polys)

CREATE VIRTUAL TABLE vloc USING geopoly(a_id, b_id);
INSERT INTO vloc (a_id, b_id, _shape) SELECT a.a_id, b_id, geopoly_regular(lon, lat, 0.0001, 3) AS _shape FROM a JOIN b ON a.a_id = b.a_id WHERE lon != '' AND lat != '';
sqlite> SELECT Count(*) FROM vloc;
134681

I want to be able to find all rows in table a within say, 10 kms of a given lon, lat. I use the following two queries as test, and a radius of 0.1 (1 deg is ~111 kms near the equator, so I am just using 0.1 deg as an approximation here)

sqlite> SELECT DISTINCT a_id FROM vloc WHERE geopoly_within(_shape, geopoly_regular(0, 0, 0.1, 4)) != 0;
(… 8 rows are returned …)
Run Time: real 0.400 user 0.348746 sys 0.048510
sqlite> SELECT DISTINCT a.a_id FROM a JOIN b ON a.a_id = b.a_id WHERE lat BETWEEN -0.1 AND 0.1 AND lon BETWEEN -0.1 AND 0.1;
(… 9 rows are returned …)
Run Time: real 0.027 user 0.001259 sys 0.006363
sqlite>

The results are not going to be exact because the queries are ever so different, but they are close enough for me. However, using geopoly takes 14 times longer than just doing a simple BETWEEN over a JOIN.

The reason I want to use geopoly is that I can use other geospatial libraries to convert from lat,lon to cartesian coordinates and allow more natural queries such as "all the rows within x kms of a given point", but some of the results are two or even three orders of magnitude slower with geopoly.

What am I doing wrong? How can I improve the speed of geopoly?

2020-08-10
09:23 Edit: using geopoly to find rows within a buffer around an input point (artifact: f5148ce4dc user: punkish)

I have geographic points in lat,lng and I want to be able to SELECT rows within a buffer radius of say, 10 kms, of a given lat,lng. Here is how I am going about it:

  1. points table looks like so, and has a couple of hundred thousand rows

    CREATE TABLE foo (fooId INTEGER PRIMARY KEY, latitude REAL, longitude REAL);
    
  2. create a virtual geopoly table

    CREATE VIRTUAL TABLE fooLoc USING geopoly(fooId);
    INSERT INTO fooLoc(fooId, _shape) SELECT fooId, '[[' || longitude || ',' || latitude || '],[' || longitude || ',' || latitude || '],[' || longitude || ',' || latitude || '],[' || longitude || ',' || latitude || ']]' AS _shape FROM foo;
    
  3. given an input lat,lng pair, compute a geojson buffer (I am using nodejs to do all this work)

    const geojson = require('geojson')
    const buffer = require('@turf/buffer')
    const Database = require('better-sqlite3')
    const db = new Database('mydata.sqlite')
    
    const point = {
        latitude: -4.5077777,
        longitude: -56.289722
    }
    
    const gj = geojson.parse(point, { Point: ['latitude', 'longitude'] })
    const buffered = buffer(gj, 10, {units: 'kilometers'})
    const p = JSON.stringify(buffered.geometry.coordinates)
    
    // Now I use the buffer string in my `SELECT` statement
    const sql = "SELECT fooId, latitude, longitude FROM foo WHERE geopoly_contains_point(@p, longitude, latitude) IS NOT NULL"
    const records = db.prepare(sql).all({ p: p })
    

    But I get nothing. Just an empty array. Though I have shown the js code above, I've tried this directly in the SQLite console and the same result… an empty array.

What am I doing wrong?

2020-08-09
19:00 Reply: using geopoly to find rows within a buffer around an input point (artifact: bcf67358a8 user: punkish)

never mind, I figured it out. The buffer produces a multipolygon (even though it is a simple polygon, so it is represented by an array of array of coordinates. I have to use the one and only poly in that array. The following does the trick

const p = JSON.stringify(buffered.geometry.coordinates[0])
18:54 Edit: using geopoly to find rows within a buffer around an input point (artifact: ac8c6f04e6 user: punkish)

I have geographic points in lat,lng and I want to be able to SELECT rows within a buffer radius of say, 10 kms, of a given lat,lng. Here is how I am going about it:

  1. points table looks like so, and has a couple of hundred thousand rows

    CREATE TABLE foo (fooId INTEGER PRIMARY KEY, latitude REAL, longitude REAL);
    
  2. create a virtual geopoly table

    CREATE VIRTUAL TABLE fooLoc USING geopoly(fooId);
    INSERT INTO fooLoc(fooId, _shape) SELECT fooId, '[[' || longitude || ',' || latitude || '],[' || longitude || ',' || latitude || '],[' || longitude || ',' || latitude || '],[' || longitude || ',' || latitude || ']]' AS _shape FROM foo;
    
  3. given an input lat,lng pair, compute a geojson buffer (I am using nodejs to do all this work)

    const geojson = require('geojson')
    const buffer = require('@turf/buffer')
    const Database = require('better-sqlite3')
    const db = new Database('mydata.sqlite')
    
    const point = {
        latitude: -4.5077777,
        longitude: -56.289722
    }
    
    const gj = geojson.parse(point, { Point: ['latitude', 'longitude'] })
    const buffered = buffer(gj, 10, {units: 'kilometers'})
    const p = JSON.stringify(buffered.geometry.coordinates)
    
    // Now I use the buffer string in my `SELECT` statement
    const sql = "SELECT fooId, latitude, longitude FROM foo WHERE geopoly_contains_point(@p, @longitude, @latitude)"
    const records = db.prepare(sql).all({
        p: p,
        longitude: point.longitude,
        latitude: point.latitude,
    })
    

    But I get nothing. Just an empty array. Though I have shown the js code above, I've tried this directly in the SQLite console and the same result… an empty array.

What am I doing wrong?

18:50 Edit: using geopoly to find rows within a buffer around an input point (artifact: 7457a2669c user: punkish)

I have geographic points in lat,lng and I want to be able to SELECT rows within a buffer radius of say, 10 kms, of a given lat,lng. Here is how I am going about it:

  1. points table looks like so, and has a couple of hundred thousand rows

    CREATE TABLE foo (fooId INTEGER PRIMARY KEY, latitude REAL, longitude REAL);
    
  2. create a virtual geopoly table

    CREATE VIRTUAL TABLE fooLoc USING geopoly(fooId);
    INSERT INTO fooLoc(fooId, _shape) SELECT fooId, '[[' || latitude || ',' || longitude || '],[' || latitude || ',' || longitude || '],[' || latitude || ',' || longitude || '],[' || latitude || ',' || longitude || ']]' AS _shape FROM foo;
    
  3. given an input lat,lng pair, compute a geojson buffer (I am using nodejs to do all this work)

    const geojson = require('geojson')
    const buffer = require('@turf/buffer')
    const Database = require('better-sqlite3')
    const db = new Database('mydata.sqlite')
    
    const point = {
        latitude: -4.5077777,
        longitude: -56.289722
    }
    
    const gj = geojson.parse(point, { Point: ['latitude', 'longitude'] })
    const buffered = buffer(gj, 10, {units: 'kilometers'})
    const p = JSON.stringify(buffered.geometry.coordinates)
    
    // Now I use the buffer string in my `SELECT` statement
    const sql = "SELECT fooId, latitude, longitude FROM foo WHERE geopoly_contains_point(@p, @longitude, @latitude)"
    const records = db.prepare(sql).all({
        p: p,
        longitude: point.longitude,
        latitude: point.latitude,
    })
    

    But I get nothing. Just an empty array. Though I have shown the js code above, I've tried this directly in the SQLite console and the same result… an empty array.

What am I doing wrong?

18:42 Post: using geopoly to find rows within a buffer around an input point (artifact: b0112b818a user: punkish)

I have geographic points in lat,lng and I want to be able to SELECT rows within a buffer radius of say, 10 kms, of a given lat,lng. Here is how I am going about it:

  1. points table looks like so, and has a couple of hundred thousand rows

    CREATE TABLE foo (fooId INTEGER PRIMARY KEY, latitude REAL, longitude REAL);
    
  2. create a virtual geopoly table

    CREATE VIRTUAL TABLE fooLoc USING geopoly(fooId);
    INSERT INTO fooLoc(fooId, _shape) SELECT fooId, '[[' || latitude || ',' || longitude || '],[' || latitude || ',' || longitude || '],[' || latitude || ',' || longitude || '],[' || latitude || ',' || longitude || ']]' AS _shape FROM foo;
    
  3. given an input lat,lng pair, compute a geojson buffer (I am using nodejs to do all this work)

    const geojson = require('geojson')
    const buffer = require('@turf/buffer')
    const Database = require('better-sqlite3')
    const db = new Database('mydata.sqlite')
    
    const point = {
        latitude: -4.5077777,
        longitude: -56.289722
    }
    
    const gj = geojson.parse(point, { Point: ['latitude', 'longitude'] })
    const buffered = buffer(gj, 10, {units: 'kilometers'})
    const p = JSON.stringify(buffered.geometry.coordinates)
    
    // Now I use the buffer string in my `SELECT` statement
    const sql = "SELECT fooId, latitude, longitude FROM foo WHERE geopoly_contains_point(@p, latitude, longitude)"
    const records = db.prepare(sql).all({ p: p })
    

    But I get nothing. Just an empty array. Though I have shown the js code above, I've tried this directly in the SQLite console and the same result… an empty array.

What am I doing wrong?

2020-03-17
17:54 Reply: Convert datetime string to second since Epoch with millisecond precision (artifact: b5f94212df user: punkish)

how about

sqlite> SELECT strftime('%s', a) +  strftime('%f', a) - strftime('%S', a) FROM
   ...> (SELECT '2016-06-13T09:36:34.123Z' a);
1465810594.123
sqlite>
2020-03-15
15:01 Reply: (email) Lack of paragraph wrapping in text/plain emails makes them hard to read (artifact: 36aeeedde1 user: punkish)

The format=flowed setting is unreliable (not all clients support it, or support it well).

I think therein lies the rub. One could respond (somewhat justifiably) that one gets a client that support format=flowed.

But that is the wrong end of the process to focus on. Generation, not consumption, is where you support all client scenarios the best.

Perhaps, but not in the eyes of the one who is offering this forum. SQLite, the main product of DRH and his crew, and whoever else contributing to its source directly, [^1] does a seemingly impossible and incredible job of supporting all kinds of clients (OS and hardware). Many of them I've never even heard of let alone ever using them knowingly. In his view, the Fossil-based forum does a better job of supporting the users, and many users agree, some don't. That's how it goes.

[^1]: by the way, I really don't know who all work on this lovely little db besides Richard. I have many folks I can thank for their help – Dan Kennedy, Keith Medcalf, Simon Slavin (there used to be a gentleman named Igor – a wizard with SQL – he seems to not be around any more). But who all actually work on creating SQLite on a day-to-day basis? Many thanks to them, and to all those who love helping others.

13:52 Reply: email gateway Confusing authorship on edited posts (artifact: 0bbb38ab42 user: punkish)

yes, I would second this as a problem too have noticed and have been confused by it. I received a series of posts by DRH that were obviously not from Richard. I am assuming Richard edited or posted those messages to this forum either picking them up from the mailing list, or editing something in them. As a result, the original authorship was obfuscated (phew, I've wanted to use that word for a while now)

13:06 Reply: (email) Lack of paragraph wrapping in text/plain emails makes them hard to read (artifact: a05f706d7f user: punkish)

most modern email clients have a format=flowed setting which makes the text flow to the width of the mail client. You are welcome to use backticks reserved for code blocks, but that is really not necessary. My email client shows your text with line breaks just as well as it would have had you written a normal paragraph not restricted by the technologies of the olden days.

2020-03-13
20:53 Edit reply: a css to improve readability of this forum (artifact: 3db0700285 user: punkish)

Update: I bought Cascadea and installed it, and it works perfectly. Worth all of three bucks it cost me. Recommended if you are a Safari user like me.

Hi everyone,

I have cleaned up the CSS I am using and also added instructions. Both are available at https://gist.github.com/punkish/309666d1a001061a521048869edc73cd

The instructions are for Firefox (at this time) because it has a free extension that allows adding site-specific CSS. Personally I am a Safari user, and I've found Cascadea that claims to do the same thing. I haven't tried it yet, but will soon. In the meantime, I can attest that the above instructions work very well for Firefox. And, I am sure there must be something similar possible with Chrome.

Hope this is useful (for some of you).

20:39 Reply: a css to improve readability of this forum (artifact: 604dc8433c user: punkish)

Hi everyone,

I have cleaned up the CSS I am using and also added instructions. Both are available at https://gist.github.com/punkish/309666d1a001061a521048869edc73cd

The instructions are for Firefox (at this time) because it has a free extension that allows adding site-specific CSS. Personally I am a Safari user, and I've found Cascadea that claims to do the same thing. I haven't tried it yet, but will soon. In the meantime, I can attest that the above instructions work very well for Firefox. And, I am sure there must be something similar possible with Chrome.

Hope this is useful (for some of you).

15:29 Reply: JOINs with FTS5 virtual tables are very slow (artifact: 509bdbe534 user: punkish)

Yes, My experience was similar. Running analyze on the two culprit tables in a 5 GB database took a couple of seconds, maybe less. That said, I will follow Simon’s advice and run analyze consciously, not automatically via a program. Should do the trick for me. Thanks all, especially to Dan Kennedy, for helping solve this.

14:34 Reply: JOINs with FTS5 virtual tables are very slow (artifact: f17e16e14b user: punkish)

Thanks Simon, good to know this. My db is primarily read only. Once I fill it with data, it is used to power an API. So I can fill the db, create the indexes, run ANALYZE on all the tables, and be ready.

14:15 Reply: JOINs with FTS5 virtual tables are very slow (artifact: 9ef4087b98 user: punkish)

jeebus! that was it! I ran ANALYZE on both treatments and materialsCitations tables. The QUERY PLAN changed ever so slightly (on the order of searching treatments and materialsCitations tables was flipped)

QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_treatmentId (deleted=? AND treatmentId=?)
`--USE TEMP B-TREE FOR GROUP BY

But the query was like night and day

sqlite> SELECT
   ...>     collectionCode,
   ...>     Count(collectionCode) AS c
   ...> FROM
   ...>     materialsCitations
   ...>     JOIN treatments ON materialsCitations.treatmentId = treatments.treatmentId
   ...>     JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId
   ...> WHERE
   ...>     collectionCode != ''
   ...>     AND materialsCitations.deleted = 0
   ...>     AND treatments.deleted = 0
   ...>     AND vtreatments MATCH "carabus"
   ...> GROUP
   ...>     BY collectionCode;
collectionCode  c
--------------  ----------
CFDD            1
CFDD, CTAY      3
CFDD, CTAY, CZ  23
CFDD, CTAY, CZ  2
IZAS, MPU, BMN  1
NMNHS           14
NMP             6
ZISP            2
ZISP, CAK, CBK  1
ZISP, CFDD, CT  1
Run Time: real 0.259 user 0.014076 sys 0.064197
sqlite>

boom! thanks Dan! you saved me hours of headache trying to rewrite my query making routines.

A question – if ANALYZE makes such a difference, why doesn't SQLite just do it anyway? I mean, is it *not* advised toANALYZEunder certain circumstances? Keep in mind, I didn't firstINDEXthe tables and *then* changed thedeletedcolumn (thereby possibly necessitatingANALYZE). Thedeleted` column was set to 0 from the get go.

14:07 Reply: a css to improve readability of this forum (artifact: 82cf265748 user: punkish)

there is still something wrong with your implementation, and I don't yet know what. This is how it looks with your implementation

drafts2

and this is how it looks on my computer

my laptop

I will investigate this further later, but first, I have to go make some train travel changes because of the rapidly developing Covid situation here in Madrid.

13:11 Reply: a css to improve readability of this forum (artifact: 6c01d278a7 user: punkish)

The proposed css is targetted at use on a cellphone. The text is too large for a desktop and leaves the right half of a wide screen empty.

Please see my reply to Richard's post. First, make sure you use my mods on top of the existing website, not in place of the existing website's original css.

Two, yes you are correct, the css is mobile-first, but it reduces the body width to 50% on large screens there by improving readability (I did say imo). Wide columns from left to right make for a poor reading experience. On my Macbook at 1400px, a 50% wide column of text is a very pleasant reading experience.

That said, you are also probably correct that more judicious media queries could be used. I hacked this in about 10 mins without using any library. I've put my mods as a gist so please do fork it and make it better. I would be very happy to use a better version (without making it onerous – that is, using big external libraries).

Many thanks.

13:06 Edit reply: a css to improve readability of this forum (artifact: a5752ba304 user: punkish)

yes, I see a problem right away. You see, I can't disable your css, so I am using my mods as a supplement. That is, they are applied after your original css is applied, thereby overriding a few of its directives. If you completely replace your css with mine, then it will not work as it won't have much of the foundational directives. In other words, my css cascades over your css. If you use it like that, it will (should) look a lot better.

13:04 Reply: a css to improve readability of this forum (artifact: bbbc945880 user: punkish)

yes, right away I see a problem rightaway. You see, I can't disable your css, so I am using my mods as a supplement. That is, they are applied after your original css is applied, thereby overriding a few of its directives. If you completely replace your css with mine, then it will not work as it won't have much of the foundational directives. In other words, my css cascades over your css. If you use it like that, it will (should) look a lot better.

13:00 Reply: a css to improve readability of this forum (artifact: 36cb08c001 user: punkish)

yes, something is wrong in the way it is working on the draft2 website. It looks waaaay better on my computer. I will investigate and see what is going on.

12:40 Edit: a css to improve readability of this forum (artifact: 55a14d3904 user: punkish)

Personally, I not only don't mind this forum but also like it in many respects, specially the ability to embed code with styling, and to be able to edit mistakes. I do have a couple of small questions:

  1. Can the email mailing be turned off/on by a setting?

  2. Every time I edit an existing post, I get mailed a copy. I love the ability to edit mistakes in an already committed post, but I would not like to spam everyone with my edits. They can always see the latest version on the web anyway.

That said, I am not a great typographer/designer but have opinions (like everyone else). I have created the following little css that (imo) greatly improves the look and readability of this forum (see below for the css). But first, I would request the forum designer to consider the following three requests:

  1. prefix all the css with a namespace, for example, FOSSIL or SQLITE would do. It would greatly help in making a custom css.

  2. to not use inline css to style the replies in the hierarchical tree. Instead, use classnames (with the above suggested prefix, of course). This too would allow styling the custom css.

  3. And finally, the forum is serving about 1100 lines of css (give or take a few) but really seems to use only a handful of lines (the stuff up until line 275). Why not get rid of the fossil-specific css that has little use on this forum website?

Now, my mods (also available at https://gist.github.com/punkish/309666d1a001061a521048869edc73cd in case someone else wants to imrove it further)

body {
    font-family: serif;
    font-size:18pt;
}

input[type=text] {
  height: 40px;
}
input[type=submit] {
    background-color: lightblue;
    border: 1px solid darkblue;
    padding: 10px;
  height: 40px;
    font-size: 12px;
    border-radius: 10px;
}

a {
    color: #0000ff;
    text-decoration: underline;
}

.title {
    text-align: center;
    float: unset;
    line-height: 1rem;
}
.title h1 {
    display:block;
    color: darkblue;
}
.title h1:after {
    content: " ";
    color: #777;
    font-weight: normal;
}
.status {
    font-family: sans-serif;
    position: absolute;
    top: 0;
    right: 25px;
}
.mainmenu {
    border: none;
    border-top: 1px solid;
    border-bottom: 1px solid;
    border-radius: 0;
    background: unset !important;
}
.content p {
  line-height: 2rem;
}
.forumHierRoot {
  padding: 2rem;
}
div.forumHier, div.forumTime {
  border: none;
  border-left: 1px solid grey;
  border-bottom: 1px solid grey;
  border-radius: 0 10px;
  padding: 10px;
  background-color: #f5f5f5;
}
pre {
  padding: 1rem;
    background-color: lightgrey;
  font-family: menlo, consolas, monospace;
  font-size: 12px;
}
@media screen and (max-width: 600px) {
  /* Spacing for mobile */

}
@media screen and (min-width: 750px) {
  /* Spacing for desktop */
  body {
      width: 50%;
  }
}
@media screen and (max-width: 1200px) {
  /* Special declarations for narrow desktop or wide mobile */

}

Hope this helps others

12:37 Edit: a css to improve readability of this forum (artifact: e2890b0bcc user: punkish)

Personally, I not only don't mind this forum but also like it in many respects, specially the ability to embed code with styling, and to be able to edit mistakes. I do have a couple of small questions:

  1. Can the email mailing be turned off/on by a setting?

  2. Every time I edit an existing post, I get mailed a copy. I love the ability to edit mistakes in an already committed post, but I would not like to spam everyone with my edits. They can always see the latest version on the web anyway.

That said, I am not a great typographer/designer but have opinions (like everyone else). I have created the following little css that (imo) greatly improves the look and readability of this forum (see below for the css). But first, I would request the forum designer to consider the following three requests:

  1. prefix all the css with a namespace, for example, FOSSIL or SQLITE would do. It would greatly help in making a custom css.

  2. to not use inline css to style the replies in the hierarchical tree. Instead, use classnames (with the above suggested prefix, of course). This too would allow styling the custom css.

  3. And finally, the forum is serving about 1100 lines of css (give or take a few) but really seems to use only a handful of lines (the stuff up until line 275). Why not get rid of the fossil-specific css that has little use on this forum website?

Now, my mods (also available at https://gist.github.com/punkish/309666d1a001061a521048869edc73cd in case someone else wants to imrove it further)

body {
    font-family: serif;
    font-size:18pt;
}

input[type=text] {
  height: 40px;
}
input[type=submit] {
    background-color: lightblue;
    border: 1px solid darkblue;
    padding: 10px;
  height: 40px;
    font-size: 12px;
    border-radius: 10px;
}

a {
    color: #0000ff;
    text-decoration: underline;
}

.title {
    text-align: center;
    float: unset;
    line-height: 1rem;
}
.title h1 {
    display:block;
    color: darkblue;
}
.title h1:after {
    content: " ";
    color: #777;
    font-weight: normal;
}
.status {
    font-family: sans-serif;
    position: absolute;
    top: 0;
    right: 25px;
}
.mainmenu {
    border: none;
    border-top: 1px solid;
    border-bottom: 1px solid;
    border-radius: 0;
    background: unset !important;
}
.content p {
  line-height: 2rem;
}
.forumHierRoot {
  padding: 2rem;
}
div.forumHier, div.forumTime {
  border: none;
  border-left: 1px solid grey;
  border-bottom: 1px solid grey;
  border-radius: 0 10px;
  padding: 10px;
  background-color: #f5f5f5;
}
pre {
  padding: 1rem;
    background-color: lightgrey;
  font-family: menlo, consolas, monospace;
  font-size: 12px;
}
@media screen and (max-width: 600px) {
  /* Spacing for mobile */

}
@media screen and (min-width: 750px) {
  /* Spacing for desktop */
  body {
      width: 50%;
  }
}
@media screen and (max-width: 1200px) {
  /* Special declarations for narrow desktop or wide mobile */

}

Hope this helps others

12:30 Post: a css to improve readability of this forum (artifact: 04c71a9817 user: punkish)

Personally, I not only don't mind this forum but also like it in many respects, specially the ability to embed code with styling, and to be able to edit mistakes. I do have a couple of small questions:

  1. Can the email mailing be turned off/on by a setting?

  2. Every time I edit an existing post, I get mailed a copy. I love the ability to edit mistakes in an already committed post, but I would not like to spam everyone with my edits. They can always see the latest version on the web anyway.

That said, I am not a great typographer/designer but have opinions (like everyone else). I have created the following little css that (imo) greatly improves the look and readability of this forum (see below for the css). But first, I would request the forum designer to consider the following three requests:

  1. prefix all the css with a namespace, for example, FOSSIL or SQLITE would do. It would greatly help in making a custom css.

  2. to not use inline css to style the replies in the hierarchical tree. Instead, use classnames (with the above suggested prefix, of course). This too would allow styling the custom css.

  3. And finally, the forum is serving about 1100 lines of css (give or take a few) but really seems to use only a handful of lines (the stuff up until line 275). Why not get rid of the fossil-specific css that has little use on this forum website?

Now, my mods (also available at https://gist.github.com/punkish/309666d1a001061a521048869edc73cd in case someone else wants to imrove it further)

body {
    font-family: serif;
    font-size:18pt;
}

input[type=submit] {
    background-color: lightblue;
    border: 1px solid darkblue;
    padding: 10px;
    font-size: large;
    border-radius: 10px;
}

a {
    color: #0000ff;
    text-decoration: underline;
}

.title {
    text-align: center;
    float: unset;
    line-height: 1rem;
}
.title h1 {
    display:block;
    color: darkblue;
}
.title h1:after {
    content: " ";
    color: #777;
    font-weight: normal;
}
.status {
    font-family: sans-serif;
    position: absolute;
    top: 0;
    right: 25px;
}
.mainmenu {
    border: none;
    border-top: 1px solid;
    border-bottom: 1px solid;
    border-radius: 0;
    background: unset !important;
}
.content p {
  line-height: 2rem;
}
.forumHierRoot {
  padding: 2rem;
}
div.forumHier, div.forumTime {
  border: none;
  border-left: 1px solid grey;
  border-bottom: 1px solid grey;
  border-radius: 0 10px;
  padding: 10px;
  background-color: #f5f5f5;
}
pre {
  padding: 1rem;
    background-color: lightgrey;
  font-family: menlo, consolas, monospace;
  font-size: 12px;
}
@media screen and (max-width: 600px) {
  /* Spacing for mobile */

}
@media screen and (min-width: 750px) {
  /* Spacing for desktop */
  body {
      width: 50%;
  }
}
@media screen and (max-width: 1200px) {
  /* Special declarations for narrow desktop or wide mobile */

}

Hope this helps others

11:07 Edit: JOINs with FTS5 virtual tables are very slow (artifact: 5f79ba0c13 user: punkish)

I had submitted the following question a few days ago to which Dan Kennedy very kindly replied. But the problem wasn't really resolved so I am resubmitting it. This time I am trying to be less clever and not submitting pseudo-code. Instead, these are actual queries that I ran about 5 mins ago. Here goes.

Note: the total row counts are as follows

  • treatments 308498
  • materialsCitations 269546

The following query took 170s and returned 10 rows. Let's call this Query 1

Query 1

SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
	JOIN treatments ON materialsCitations.treatmentId = treatments.treatmentId
	JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId
WHERE 
	collectionCode != ''
	AND materialsCitations.deleted = 0 
	AND treatments.deleted = 0
	AND vtreatments MATCH "carabus"
GROUP 
	BY collectionCode;

-- QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
|--SEARCH TABLE treatments USING COVERING INDEX ix_treatments_treatmentId (deleted=? AND treatmentId=?)
`--USE TEMP B-TREE FOR GROUP BY

Let's break it down into sub-queries. The following takes 668ms

Query 2

SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
    JOIN treatments ON materialsCitations.treatmentId = treatments.treatmentId
WHERE 
	collectionCode != ''
	AND materialsCitations.deleted = 0 
    AND treatments.deleted = 0
GROUP 
	BY collectionCode;

-- QUERY PLAN
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
`--SEARCH TABLE treatments USING COVERING INDEX ix_treatments_treatmentId (deleted=? AND treatmentId=?)

The following takes 37ms

Query 3

SELECT treatmentId FROM vtreatments WHERE vtreatments MATCH "carabus";

-- QUERY PLAN
 SCAN TABLE vtreatments VIRTUAL TABLE INDEX 131073:

Now, let's put them together. The following query takes 439ms and also returns 10 rows. As one can see, Query 1 is more than 380 times slower than Query 4 even though producing identical results.

Query 4

SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
	JOIN (SELECT * FROM treatments WHERE treatmentId IN 
            (SELECT treatmentId FROM vtreatments WHERE vtreatments MATCH "carabus")) t
            ON materialsCitations.treatmentId = t.treatmentId
WHERE 
 	collectionCode != ''
	AND materialsCitations.deleted = 0 
	AND t.deleted = 0
GROUP 
	BY collectionCode;

--QUERY PLAN
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
|--LIST SUBQUERY 1
|  `--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
`--LIST SUBQUERY 1
   --SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m

Consistently, my experience is that when I JOIN a virtual table to a normal table, the query is not longer performant. Is that expected or can I do something about it to make such a JOIN efficient? If yes, then I will try to change my program so it creates queries of the type Query 4, although given the unpredictability of what params the users might submit, it is going to be very tricky.

A related observation is that querying a virtual table is usually very fast unless the query finds a lot of results. I don't have a cut-off number, but a query finding fewer than 100 rows is blindingly fast but a query finding 80K rows is very slow. And, I don't really mean "returning", I really mean "finding". That is, just returning the Count(*) of matching rows is very slow when the count is very high. I would expect that such a count would be returned from some kind of term frequency index.

10:09 Edit: JOINs with FTS5 virtual tables are very slow (artifact: e2debc62ee user: punkish)

I had submitted the following question a few days ago to which Dan Kennedy very kindly replied. But the problem wasn't really resolved so I am resubmitting it. This time I am trying to be less clever and not submitting pseudo-code. Instead, these are actual queries that I ran about 5 mins ago. Here goes.

Note: the total row counts are as follows

  • treatments 308498
  • materialsCitations 269546

The following query took 170s and returned 10 rows. Let's call this Query 1

Query 1

SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
	JOIN treatments ON materialsCitations.treatmentId = treatments.treatmentId
	JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId
WHERE 
	collectionCode != ''
	AND materialsCitations.deleted = 0 
	AND treatments.deleted = 0
	AND vtreatments MATCH "carabus"
GROUP 
	BY collectionCode;

-- QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
|--SEARCH TABLE treatments USING COVERING INDEX ix_treatments_treatmentId (deleted=? AND treatmentId=?)
`--USE TEMP B-TREE FOR GROUP BY

Let's break it down into sub-queries. The following takes 668ms

Query 2

SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
    JOIN treatments ON materialsCitations.treatmentId = treatments.treatmentId
WHERE 
	collectionCode != ''
	AND materialsCitations.deleted = 0 
    AND treatments.deleted = 0
GROUP 
	BY collectionCode;

-- QUERY PLAN
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
`--SEARCH TABLE treatments USING COVERING INDEX ix_treatments_treatmentId (deleted=? AND treatmentId=?)

The following takes 37ms

Query 3

SELECT treatmentId FROM vtreatments WHERE vtreatments MATCH "carabus";

-- QUERY PLAN
 SCAN TABLE vtreatments VIRTUAL TABLE INDEX 131073:

Now, let's put them together. The following query takes 439ms and also returns 10 rows. As one can see, Query 1 is more than 380 times slower than Query 4 even though producing identical results.

Query 4

SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
	JOIN (SELECT * FROM treatments WHERE treatmentId IN 
            (SELECT treatmentId FROM vtreatments WHERE vtreatments MATCH "carabus")) t
            ON materialsCitations.treatmentId = t.treatmentId
WHERE 
 	collectionCode != ''
	AND materialsCitations.deleted = 0 
	AND t.deleted = 0
GROUP 
	BY collectionCode;

--QUERY PLAN
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
|--LIST SUBQUERY 1
|  `--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
`--LIST SUBQUERY 1
   --SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m

Consistently, my experience is that when I JOIN a virtual table to a normal table, the query is not longer performant. Is that expected or can I do something about it to make such a JOIN efficient? If yes, then I will try to change my program so it creates queries of the type Query 4, although given the unpredictability of what params the users might submit, it is going to be very tricky.

A related observation is that querying a virtual table is usually very fast unless the query finds a lot of results. I don't have a cut-off number, but a query finding fewer than 100 rows is blindingly fast but a query finding 80K rows is very slow. And, I don't really mean "returning", I really mean "finding". That is, just returning the Count(*) of matching rows is very low when the count is very high. I would expect that such a count would be returned from some kind of term frequency index.

09:53 Post: JOINs with FTS5 virtual tables are very slow (artifact: c99fda4f62 user: punkish)

I had submitted the following question a few days ago to which Dan Kennedy very kindly replied. But the problem wasn't really resolved so I am resubmitting it. This time I am trying to be less clever and not submitting pseudo-code. Instead, these are actual queries that I ran about 5 mins ago. Here goes.

The following query took 170s and returned 10 rows. Let's call this Query 1

Query 1

SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
	JOIN treatments ON materialsCitations.treatmentId = treatments.treatmentId
	JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId
WHERE 
	collectionCode != ''
	AND materialsCitations.deleted = 0 
	AND treatments.deleted = 0
	AND vtreatments MATCH "carabus"
GROUP 
	BY collectionCode;

-- QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
|--SEARCH TABLE treatments USING COVERING INDEX ix_treatments_treatmentId (deleted=? AND treatmentId=?)
`--USE TEMP B-TREE FOR GROUP BY

Let's break it down into sub-queries. The following takes 668ms

Query 2

SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
    JOIN treatments ON materialsCitations.treatmentId = treatments.treatmentId
WHERE 
	collectionCode != ''
	AND materialsCitations.deleted = 0 
    AND treatments.deleted = 0
GROUP 
	BY collectionCode;

-- QUERY PLAN
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
`--SEARCH TABLE treatments USING COVERING INDEX ix_treatments_treatmentId (deleted=? AND treatmentId=?)

The following takes 37ms

Query 3

SELECT treatmentId FROM vtreatments WHERE vtreatments MATCH "carabus";

-- QUERY PLAN
 SCAN TABLE vtreatments VIRTUAL TABLE INDEX 131073:

Now, let's put them together. The following query takes 439ms and also returns 10 rows. As one can see, Query 1 is more than 380 times slower than Query 4 even though producing identical results.

Query 4

SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
	JOIN (SELECT * FROM treatments WHERE treatmentId IN 
            (SELECT treatmentId FROM vtreatments WHERE vtreatments MATCH "carabus")) t
            ON materialsCitations.treatmentId = t.treatmentId
WHERE 
 	collectionCode != ''
	AND materialsCitations.deleted = 0 
	AND t.deleted = 0
GROUP 
	BY collectionCode;

--QUERY PLAN
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
|--LIST SUBQUERY 1
|  `--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
`--LIST SUBQUERY 1
   --SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m

Consistently, my experience is that when I JOIN a virtual table to a normal table, the query is not longer performant. Is that expected or can I do something about it to make such a JOIN efficient? If yes, then I will try to change my program so it creates queries of the type Query 4, although given the unpredictability of what params the users might submit, it is going to be very tricky.

A related observation is that querying a virtual table is usually very fast unless the query finds a lot of results. I don't have a cut-off number, but a query finding fewer than 100 rows is blindingly fast but a query finding 80K rows is very slow. And, I don't really mean "returning", I really mean "finding". That is, just returning the Count(*) of matching rows is very low when the count is very high. I would expect that such a count would be returned from some kind of term frequency index.

2020-03-12
22:01 Edit: will the posts from the mailing list be migrated to here? (artifact: 984a02b101 user: punkish)

specifically wrt to the posts that are still being discussed or are not fully resolved, will they be migrated to here or shall we repost them here?


punkish.org
just another creative commoner

22:00 Reply: will the posts from the mailing list be migrated to here? (artifact: 0d9a73115b user: punkish)

With my question, I meant to ask: will the old posts be migrated here automatically (in other words, by you or someone who manages the list) or should we repost them ourselves here. From your answer, I take that there will be no automatic migration so if we want to continue an already happening conversation, we might be better of reposting the question here. No worries.

p.

21:35 Post: will the posts from the mailing list be migrated to here? (artifact: f95976cad8 user: punkish)

specifically wrt to the posts that are still being discussed or are not fully resolved, will they be migrated to here or shall we repost them here?


punkish.org
just another creative comoner