SQLite Forum

comparing rtree and geopoly for the same geographies
Login

comparing rtree and geopoly for the same geographies

(1) By punkish on 2023-04-10 20:27:30 [link] [source]

I am testing both rtree and geopoly extensions for searching points, and I am having mixed luck -- for the same data rtree works and geopoly does not. Perhaps some kind soul can tell me the error of my ways. Given the following tables:

CREATE TABLE m (…, lat, lng);
CREATE VIRTUAL TABLE mGeopoly USING geopoly (lat, lng);
CREATE VIRTUAL TABLE mRtree USING rtree (
    id, minX, maxX,minY, maxy
);

Below is my pseudo-ish JavaScript code that inserts the data

rows = `SELECT DISTINCT lat, lng FROM m`;
insGeopoly = `INSERT INTO mGeopoly (_shape, lat, lng) 
    VALUES (@_shape, @lat, @lng)`;
insRtree = `INSERT INTO mRtree (id, minX, maxX, minY, maxY) 
    VALUES (@id, @minX, @maxX, @minY, @maxY)`;

id = 0;
for (row of rows) {
    id++;
    minX = row.lng;
    maxX = row.lng;
    minY = row.lat;
    maxY = row.lat;
    ll = `[${minX},${minY}]`;
    lr = `[${maxX},${minY}]`;
    ur = `[${maxX},${maxY}]`;
    ul = `[${minX},${maxY}]`;
    _shape = `[${ll},${lr},${ur},${ul},${ll}]`;
    
    insGeopoly.run({_shape, lat: row.lat, lng: row.lng});
    insRtree.run({id, minX, maxX, minY, maxY});
}

Now I select for points within random lng/lat boxes, and get results from the rtree table but nothing from the geopoly table.

selRtree = `SELECT *   
            FROM mRtree r JOIN m ON r.minX = m.lng AND r.minY = m.lat 
            WHERE 
                minX>=@minX AND maxX<=@maxX AND 
                minY>=@minY AND maxY<=@maxY`;
selGeopoly = `SELECT * 
                FROM mGeopoly g JOIN m ON g.lng = m.lng AND g.lat = m.lat 
                WHERE geopoly_within(_shape, @_shape)`;

for (let i = 0, j = 50; i < j; i++) {
    minX = randomLng();
    maxX = minX + 5;
    minY = randomLat();
    maxY = minY + 5;
    ll = `[${minX},${minY}]`;
    lr = `[${maxX},${minY}]`;
    ur = `[${maxX},${maxY}]`;
    ul = `[${minX},${maxY}]`;
    _shape = `[${ll},${lr},${ur},${ul},${ll}]`;

    selRtree.all({minX, maxX, minY, maxY});
    selGeopoly.all({_shape})
        
    if (resRtree.length) {
        console.log(`${minX.toFixed(2)}, ${minY.toFixed(2)}, ${maxX.toFixed(2)}, ${maxY.toFixed(2)}: rtree (${resRtree.length}), geopoly (${resGeopoly.length})`);
    }
}

// output
-61.06, -11.57, -56.06, -6.57: rtree (1), geopoly (0)
25.97, 33.91, 30.97, 38.91: rtree (7), geopoly (0)
7.60, -31.09, 12.60, -26.09: rtree (1), geopoly (0)
15.32, 16.93, 20.32, 21.93: rtree (1), geopoly (0)
58.14, 40.94, 63.14, 45.94: rtree (2), geopoly (0)
-61.15, -36.35, -56.15, -31.35: rtree (4), geopoly (0)
69.97, 31.80, 74.97, 36.80: rtree (3), geopoly (0)
-137.29, 54.07, -132.29, 59.07: rtree (1), geopoly (0)
98.06, 7.11, 103.06, 12.11: rtree (6), geopoly (0)
-79.21, 18.92, -74.21, 23.92: rtree (1), geopoly (0)

(2) By Rowan Worth (sqweek) on 2023-04-19 07:59:46 in reply to 1 [link] [source]

Does it work if you use shapes with a non-zero area? Here everything you insert has minX=maxX and minY=maxY, which at the very least creates ambiguity about whether the points are in clockwise or counter-clockwise order because all four corners share the same coordinate.

(3) By punkish on 2023-04-19 08:33:57 in reply to 2 [link] [source]

hi Rowan, I am actually in the process of writing up my very detailed observations and experience on this issue to not just document it forever but also hopefully to help someone else in my situation. For now, let me quickly respond to your question -- yes and no. search in geopoly doesn't work with zero-area shapes while rtree does (basically because rtree doesn't really store polys… it stores only the four corners of a poly). Both do accept zero-area shapes though. I will finish my documentation and make it available soon as I can.

(4) By Bidski on 2023-05-04 22:36:24 in reply to 3 [link] [source]

Did you ever finish writing up your observations for this? I would be interested to know what you found

(5) By punkish on 2023-05-07 21:44:21 in reply to 4 [link] [source]

see https://sqlite.org/forum/forumpost/5e12f4f3c0 (or https://punkish.org/Geodata-with-SQLite/). Please let me know if it helps or needs modification to make it more useful.

(6) By Ryan Smith (cuz) on 2023-05-08 09:00:13 in reply to 5 [source]

Some suggested changes:

Your formula for testing a healthy Lat/Long entry needs to be slightly improved in the "validGeo" column. You have this:

CREATE TABLE IF NOT EXISTS t (
        id INTEGER PRIMARY KEY,
        longitude REAL,
        latitude REAL,
        desc TEXT,
        validGeo BOOLEAN GENERATED ALWAYS AS (
            typeof(longitude) = 'real' AND 
            abs(longitude) <= 180 AND 
            typeof(latitude) = 'real' AND 
            abs(latitude) <= 90
        ) STORED
)

The test needs to be ABS(latitude) < 90 (not "<=") because at 90 degrees the math formula will become undefined/NULL (since dividing by COS(90) = DIV0!). Having done this in some projects I can also tell you that the safe thing is to check for Latitude being within the 87 degree boundaries, as anything above 87 degrees (or below -87) becomes silly in calculations as the meters-per-degree of longitude changes significantly more than that of latitude. In fact last time I checked the Google reference Mercator (in case you wish to draw/refer Geo data from there) doesn't allow for above 85 degrees offset from the Equator. It's been a while though, so this may have changed. To be more exact, every square (as your geopolyRegular() function claims to make) in Lat/Lon coordinates, unless it has its center on the Equator, is a Trapesium shape in real meters, not a square, and the distortion intensifies as latitude increases, to the point where above 87deg it becomes very close to a long thin Triangle.

It's also maybe useful to note that the formula is a rough approximation to the averaged Geodesic and not precise, nor taking elevation into account or the fact that Earth's geodesics are not perfectly spherical (there are formulae that do), though it should still be within 5m latitudinal distance at the worst error. (In case someone is interested in sub 5m distances of their Lat/Lon/elevation).

There's a lot more to say, but I'll stop here since I doubt you need to get into the detail above, but perhaps just say that the calculations used are for demonstration and more accurate ones do exist, in case someone takes your article as Geo-precise.

Lastly, you never mention what you scripted it in before diving right in - I had to read through some bit of code to detect the ES6 notation before the light went on, but even then, understanding it was done specifically in NodeJS only became clear at the bottom of the page when I saw a link to it. (I realize the language is not important to the example, but it still helps to know what you are looking at to follow the meaning).

Good article though, succinct and showing the idea schematics and SQL clearly.

(7) By punkish on 2023-05-08 10:46:27 in reply to 6 [link] [source]

thanks for your suggestions. I have incorporated them in the article.