Country-coding from Lat/Long, using GeoPoly and/or R-Tree
I have tables of geometrical objects, with Latitude/Longitude coordinates,
and we'd like to filter that data by countries. Given a set of polygons
for countries, and SQLite's GeoPoly extension, it seems like this should be relatively easy.
- Has anyone done that, and could share experience?
- Where did you get the country boudaries? At which resolution?
- Can the lat/lon is within country X be done efficiently using indexes? Perhaps an R-Tree index, for pre-filtering?
- For borderline lat/lon, can nearby countries be returned as well?
- How about perform? What should I watch out for?
At this point, I'm just looking for community advise before diving into
this myself, as I'm never used R-Tree and GeoPoly with SQLite. TIA, --DD
(2) By Gunter Hick (gunter_hick) on 2020-04-21 12:14:32 in reply to 1 [link] [source]
Apart from some member states of the USA, countries do not tend to be rectangular. They may also consist of several disjunct areas aka overseas territories. The topological Definition of "inside" and "outside" is the parity of the count of intersections with the outline along an arbitrary path between the point of interest and a point whose inside/outside property is known. e.g. draw a straight line from the POI to the (center of) capital of the country. If the number of intersections is even (ideally zero for convex polygons), then the POI is "inside" the country. Intersecting at a tangent does not count.
I implemented something similar for our traffic notification app. We were working on complex shapes which have convex and concave sides, do not intersect and do not have holes in them.
We did it using a combination of techniques.
For each area we create an outer rectangle held in an R-tree to allow a quick check to see if any point matches. This could have overlapping rectangles.
For each group of rectangles matched, we then went and checked against an outer hull, this is the area that is a polygon but has no concave faces.
If we then got a match here we then checked against it being inside the proper polygon as described by Gunter earlier.
We did try to go straight to point 3, but we found that the two earlier checks made things faster as most points are NOT in an area of interest. Your use case would probably be best started at point 2 if you know that the point is always in a country and not over a sea.
Defining the boundaries of a country could be rather complex as borders often follow weird lines as rivers or mountain ranges or roads. Also countries borders do change. How low a level of detail do you want? If you are relying on GPS then the accuracy is variable, sometimes 100m, sometimes 5m.
The "Geopoly" extension mentioned in the link of the original post supports non-rectangular regions. Each entry in Geopoly is a polygon. The extension supports primitives to:
- Determine if polygons intersect
- Determine if one polygon is entirely contained inside another
- Determine if a point is inside or outside a polygon
- Find the area of a polygon
- Perform linear transformations (flipping, stretching, rotating) on polygons.
- Render polygons as SVG.
The inputs are GeoJSON descriptions of polygons. You can search online to find lots of maps in the GeoJSON format, which can be directly imported into a Geopoly index.
An SQL script that demonstrates Geopoly's capabilities is in the source tree in the ext/rtree/visual01.txt file. Run this file as follows:
sqlite3 <visual01.txt >geopoly-demo.html open geopoly-demo.html
For your viewing convenience, I have uploaded the output to https://sqlite.org/tmp/geopoly-demo.html.
Thanks Richard. Very nice demo too. Upload for online access doubly nice too.
BTW, your demo made me think about hover and click support on SVG elements,
when inside the HTML page, and reading this nice article (among others), it seems
possible, but perhaps more control on
class elements may be
necessary for the SVG's
<polyline> elements generated by the SQL function.
I'll cross that bridge when I actually look into all that.
Thanks Gunter. As I alluded to, and Richard confirmed, GeoPoly does these things.
So this was a little too low-level an answer :). Appreciated nonetheless.
Geopoly works quite well. I have implemented (among others) a system which will find the approporate IANA Timezone for a given lat/long -- it works quite well, with the following caveats:
- there is a standard for polygon objects -- nothing follows it, especially things that claim they do
- the SQLite3 implementation does not "support" holes
- it is possible for one country to be entirely contained within another and you will have to figure out how to resolve a point being "inside" two "countries" at once
- yes, you can find "near" countries simply by making you location "point" into a circle and getting the list of overlapping polygons
- it is very fast but only uses single precision floating point so the resolution if you use native spherical coordinates (ie, lat/long) is about on par with uncorrected orbital pseudorange systems
Thanks Keith. All very good points.
If you don't mind, where did you get your timezone polygons from?
And what did you have to fix to get proper GeoJSON polys from those sources?
One good source of data seems to be Natural Earth Vector on GitHub, but
the data comes in ShapeFile (.shp, .shx) and .DBF and .CPG formats,
not something GeoPoly can use as-is :(
I got the data from GitHub. There is a lot of polygon files wafting around the Internet. Many of them claim varying levels of GeoJSON compliance.
is where I found the geojson files I used. The geojson files contained multiple polygons and sometimes multiple holes so they required a bit of pre-processing.
(11.1) Originally by Peter Johnson (missinglink) with edits by Richard Hipp (drh) on 2020-04-22 13:15:43 from 11.0 in reply to 7 [link] [source]
We publish a huge amount of openly-licensed and free-of-charge geographic data here https://geocode.earth/data/whosonfirst
Select the country you're after or 'XY' for things above the country-level such as Timezones, Oceans, Continents etc.
If you click the 'name_compressed' link under 'SQLite Databases' you'll get a compressed SQLite database, in here you'll find an 'spr' table with a summary of the place and a 'geojson' table containing the geometries and metadata.
The source of the data is the https://whosonfirst.org project which is managed by the same person as the Natural Earth project you linked.
(10) By David Raymond (dvdraymond) on 2020-04-21 16:22:28 in reply to 1 [link] [source]
As others have stated, country borders can be "fun".
If I'm remembering R-Tree basically uses bounding boxes to help narrow down the candidates for you to do the more detailed check on. Consider for example that the Aleutian Islands in Alaska cross the 180th meridian. So the bounding box for the US is basically 360 degrees wide. So any point in the latitude range from the south of Hawaii to the north of Alaska will have the US show up as one of the possible countries to do a detailed check on.
Depending on how detailed you need it you've also got multi-polys, holes, enclaves, whether the border ends at the ocean shore vs X nautical miles offshore, etc.
And Lord help you if you're trying to do business in the countries with a border dispute. Country X will be rather harsh to you if you try to sell a product in their country that says area A is actually in country Y instead of being in country X like they claim it is. And likewise with country Y if you claim that A is in country X.
But at least most of those are problems for the people making the map you're using, and not so much for you.