R-TREE SQLITE minx, maxx,miny,maxy accuracy

(1) By anonymous on 2022-06-22 13:17:07 [link] [source]

I am trying to add data to an rtree virtual table in sqlite but it changes the number from the 6th decimal place. I am saving a 7 decimal place number in the x values and it is not committing the exact number I tell it to.

Is there a way to rectify this? Even if I use the example in the rtree documentations the GPS coordinates I give the table get messed up from the 6th decimal.

Any help would be appreciated

Thanks

(2) By Richard Hipp (drh) on 2022-06-22 13:31:47 in reply to 1 [link] [source]

RTREE stores coordinates using 32-bit IEEE floats, with all of the limitation associated therewith.

RTREE was originally envisioned as an index, not as the primary data store. The idea is that if you have high-precision location information, you can store that separately, either in a separate table or in Auxiliary Columns of the RTREE. The RTREE helps you to narrow down your search for records to a small subset of the total table, but you might still need to check all the values in the results of an RTREE query to ensure that each row really does meet your constraints.

RTREE is designed such that the precision limitations of 32-bit floats might cause extra rows to appear in the result, if those rows are right on the edge, but rounding errors will never exclude rows that ought to be in the result.

(3) By Ryan Smith (cuz) on 2022-06-22 14:01:55 in reply to 1 [link] [source]

I doubt this is an SQLite problem, but from a whole lot of experience doing this sort of thing (one of our major systems do human and vehicle tracking), I can tell you that most GPS systems are not that accurate (7th decimal defines a negligible earth distance near the equator, if I recall correctly <2m), but I'll assume yours is and that you need the accuracy.

I think a normal 32-bit Float (also called "Single") is only accurate to about 7/8 digits, and this translates badly to LAT/LON storage due to the degree range - that's 7 decimals for a value < 10, 6 decimals for a value < 100 and only 5 decimals accurate above 100. I'm quoting from memory so forgive me if one of these statements are not exact.

Latitude only goes from -90 to 90 (though anything over +/-87.5 becomes nonsense on the Google Mercator) so it's always 6 decimals+ accurate, Longitude is accurate as long as you are close to Greenwhich, but over 100deg or before -100 it gets only 5 decimals of accuracy, etc. 64-bit Floats give about 16 digits precision I think, which obliterates the problem, and as far as I know SQLite will store floats as 64bit whenever needed, so this sounds to me like a 32-bit Float problem on the reading side - what API are you using to read the values, and what size float do you store the value in within your application? Are any of these done in a 32bit Float variable?

Or are you perhaps retrieving it as text? In which case, do you format it, if so, how?

(4) By Stephan Beal (stephan) on 2022-06-22 15:22:28 in reply to 3 [link] [source]

as far as I know SQLite will store floats as 64bit whenever needed, so this sounds to me like a 32-bit Float problem on the reading side - what API are you using to read the values, and what size float do you store the value in within your application? Are any of these done in a 32bit Float variable?

The rtree extensions is documented as only storing 32-bit values:

https://sqlite.org/rtree.html

under section 3:

The min/max-value pair columns are stored as 32-bit floating point values for "rtree" virtual tables or as 32-bit signed integers in "rtree_i32" virtual tables. Unlike regular SQLite tables which can store data in a variety of datatypes and formats, the R*Tree rigidly enforce these storage types. If any other type of value is inserted into such a column, the r-tree module silently converts it to the required type before writing the new record to the database.

(5) By Ryan Smith (cuz) on 2022-06-22 18:09:32 in reply to 4 [link] [source]

The rtree extensions is documented as only storing 32-bit values:

Yikes - Thank you for posting the doc ref, Stephen.

Would a 64-bit version of the RTREE module ever be on the cards? While GPS measurement devices are rarely more accurate (as mentioned), we've found trying to place things in a World map on small scale (within a room, say) to be quite handy, and that wouldn't be possible at 32-Bit accuracy.

It's not a strong feature request - We do not use SQLite for the system I mentioned (as my ignorance w.r.t. rtree format clearly shows), but I was imagining an exportable SQLite file format, which in turn would benefit from rtree, but none of that left the drawing board yet. The OP's situation may also benefit.

(6.1) By ddevienne on 2022-06-22 18:35:01 edited from 6.0 in reply to 3 [link] [source]

Maybe completely stupid...

But takes your LAT/LON degrees, and multiply them by 10'000'000;
that still fits into a signed 32-bit integer of Integer-Valued R-Trees,
which gives you a consistent 7 decimal point precision.

RTREE might still do calculations as floats, losing a part of that precision,
but as Richard explained, you can post-process the matches to use the full 32-bit precision,
instead of the 23-fraction-bits that remain from the IEEE float.

Maybe that way, you don't have to store your points twice?

UPDATE:
According to this post, 1 degree of latitude is ~111km with a spherical earth.
So 111'000 meters / 1e7 gives 0.0111 m, i.e. 11mm precision from that int,
which is precise enough for most purposes, no?

(8.1) By Ryan Smith (cuz) on 2022-06-22 20:25:59 edited from 8.0 in reply to 6.1 [link] [source]

Maybe completely stupid...

Not so I'd say, but it also won't exactly work.

10'000'000 as you suggest is already 8 digits of precision, so multiplying a low number with 7 decimals might just sneak in, example:

10'000'000 x 1.1234567 = 11'234'567 which I think still fits, but when you go higher up in degrees while trying to keep 7 decimals accuracy:

10'000'000 x 180.1234567 = 1'801'234'567 which can't fit in a 32-bit float in any way shape or form, and you will lose the last 67, perhaps even the 567.

Hope that makes the precision problem more clear.

As to the second section:

So 111'000 meters / 1e7 gives 0.0111m

I don't think it translates like that exactly, depends on the minimum change in value that can be had from a Float32, and if memory serves it was around 3cm (rather than your 1.1cm), however, that is close enough for happiness and suffices for the next point.

3cm (or 1.1) that can be had using all 7 digits in the decimals only is plenty good enough at human scales [7 digits = .1234567], which is why 7 decimals is accurate enough and used almost universally when representing LAT/LONs. This also holds for longitude at the equator.

However,

the moment you move away from "middle earth" (0th degree lat and lon) by over 1 degree, that accuracy [Edit: when stored in a Float32] falls to 6 decimals [7 digits = 1.123456], which is still a decent ~30cm, though it is already more than the width of some walls.
Going over the 10 degree mark [7 digits = 10.12345] you fall to 5 decimals or ~3m accuracy mark, and
over 100deg [7 digits = 100.1234] it's an abysmal ~30m, which is an error bigger than most people's houses.

That's the real trouble with 32bit GPS LAT/LON coordinates.

PS Edit: Someone suggested this would be more clear:

        7-digits constant width
Value: [.1234567] - 7 decimals
+1     [1.123456] - 6 decimals
+10    [10.12345] - 5 decimals
+100   [100.1234] - 4 decimals only

(9) By anonymous on 2022-06-22 21:03:38 in reply to 8.1 [link] [source]

I’m afraid you’re missing my point that the 10M multiplied lat/Lon degree values would be stored as integers, and 1.8B fits in there.

(11) By Ryan Smith (cuz) on 2022-06-22 22:30:36 in reply to 9 [link] [source]

Apologies, I thought you had meant to try store Integer values into the current Float32 format (which won't work)... I did indeed not realise you advocated for changing the RTREE format to store 32bit Integers rather than floats (which would work, but if you going to go that far, why not just upgrade it to 64-bit floats?).

(13) By ddevienne on 2022-06-23 07:10:37 in reply to 11 [link] [source]

I'm not advocating to change anything, I'm working around the limited precision of 32-bits floats in this case,
to use the existing rtree_i32 variant, that uses integers instead of floats for storage (but not computation).
Which I explicitly linked to in my previous post. Sorry, I was apparently not clear enough.

(16) By Ryan Smith (cuz) on 2022-06-23 09:01:23 in reply to 13 [link] [source]

Yes, I absolutely missed the link, the integer rtree should definitely work, and thanks for the link, I might use this myself!

(14) By anonymous on 2022-06-23 07:11:05 in reply to 11 [link] [source]

A 32-bit integer version already exists.

(7) By Keith Medcalf (kmedcalf) on 2022-06-22 19:42:04 in reply to 3 [link] [source]

The "location precision" of "decimal degrees at the equator" (this is the minimum -- as you get further away from the equator the "distance" covered by a degree diminishes from 60 nautical miles at the equator to 0 angstroms at the poles) looks like the following:

places 	degrees 	distance
0 	1.0 	        111 km
1 	0.1 	        11.1 km
2 	0.01 	        1.11 km
3 	0.001 	        111 m
4 	0.0001 	        11.1 m
5 	0.00001 	1.11 m
6 	0.000001 	0.111 m
7 	0.0000001 	1.11 cm
8 	0.00000001 	1.11 mm

One degree is 60 nautical miles which is 111120 meters.

Here is the "minimum epsilon" values for the various IEEE754 floating point types (note that the epsilon for binary128 and binary256 show as zero because the number is smaller than can be represented in binary64 output to 17 decimal places).

with sig(prec, sig) as 
     (
      values ('binary16',11),
             ('binary32',24),
             ('binary64',53),
             ('binary128',113),
             ('binary256',237)
     )
select prec, 
       format('%!.17f', ulp(360.0, sig)) 
  from sig;
┌─────────────┬──────────────────────────────────┐
│    prec     │ format('%!.17f', ulp(360.0,sig)) │
├─────────────┼──────────────────────────────────┤
│ 'binary16'  │ '0.25'                           │
│ 'binary32'  │ '0.000030517578125'              │
│ 'binary64'  │ '0.00000000000005684'            │
│ 'binary128' │ '0.0'                            │
│ 'binary256' │ '0.0'                            │
└─────────────┴──────────────────────────────────┘

This would indicate that binary16 can hold a value that is accurate to 27 kilometres at the equator; binary32 can hold a value that is accurate to 111 meters at the equator; binary64 and longer can hold a value accurate to less than a millimeter. This, of course, depends on how many guard bits you need. These numbers are based on having a "guard decimal 0" (about 3.6 bits I believe).

A NavStar CA fix will get you a DOP of about 50 meters and the short-code pseudo-range should get you to a DOP of about 10 meters.

A GPS CA fix will "fit" in a binary32 and have about equal precision.

Anything more accurate than a CA precise coordinates will require at least binary64 storage (note that this is easily obtained with pseudorange equipment that was available before the turn of the century when the appropriate "error dispersion" corrections are applied).

(10) By Bazza (Bazza81) on 2022-06-22 22:15:53 in reply to 1 [link] [source]

I am the original poster. Ok worst case I’ll save the data in a complimentary table. I am actually using the rtree for spatial analysis and my 6 digit number is actually an encoder saved as km accurate to the mm so being out a couple of mm does not make this a viable solution to read directly from the mins and maxs. Its just weird because 32bit float I assumed would save perfectly but there is something the rtree is doing that I don’t understand. I’ll save the actual data in another column for drawing the mbrs on a graph. Out of interest, I analyse approx 100kms of data with rtree and it takes 2 minutes. When I use the normal table index it takes 3 hours. So definitely r-tree was the way to go. I just need to show the exact mbr on the graph.

(12) By Bazza (Bazza81) on 2022-06-22 23:45:21 in reply to 10 [link] [source]

As a side note and out of interest I have declared the table using "rtree_i32" and multiply my values by 1,000,000 to give mm and see if that gives the desired results.

INSERT INTO demo_index_32 (minx,maxx,miny,maxy) VALUES (123456789,987654321, 111, 222)

works but as soon as I go to 10 digits

INSERT INTO demo_index_32 (minx,maxx,miny,maxy) VALUES (1234567890,9876543210, 111, 222)

The table doesn't fail but commits an incorrect value in maxx i.e 2 1234567890 1286608618 111 222

A signed 32-bit integer variable has a maximum value of 2,147,483,647 (just interestingly doesn't give any errors).

Thank you everyone for the comments and I have a better understanding of the limitations of r-tree. Much appreciated!

(15) By ddevienne on 2022-06-23 07:12:55 in reply to 12 [link] [source]

That's simply because the dummy value you chose, 9'876'543'210, overflows a 32-bit integers.
When real LAT/LON values in the max-range [-180, +180] pre-multiplied by 10'000'000 do not, and will always fit.

(17) By Ryan Smith (cuz) on 2022-06-23 09:04:39 in reply to 12 [link] [source]

For LAT/LON coordinates specifically at 7-digit accuracy, you'll find that the max values (in the proposed scheme) are -1,800,000,000 to 1,800,000,000, which fits very well into +/-2,147,483,647, even if larger 10-digit values do not.

(18.3) By greg (greglearns) on 2022-06-23 14:20:57 edited from 18.2 in reply to 17 [link] [source]

I'm a little confused what one should do SPECIFICALLY for Sqlite RTREE, so I want to make sure I'm understanding this correctly.

It sounds like it is BETTER to use an RTREE_i32 instead of float-32 RTREE by multiplying by some power of 2? or 10,000,000? And, (in almost all cases?) one will get more accuracy by multiplying by a number to turn it into an int instead of using a 32-bit float (and then dividing by the same number when taking it out of the database).

So, my new rule in my head will be: "Use RTREE_i32 instead of the standard RTREE (float-32s) for Geo / world coordinates" (-180/+180 lng and +90/-90 lat).

@ddevienne points out an issue that I'm not sure how it affects things (specifically about the 23-fraction bits): "RTREE might still do calculations as floats, losing a part of that precision, but as Richard explained, you can post-process the matches to use the full 32-bit precision, instead of the 23-fraction-bits that remain from the IEEE float." I'm not sure what the implication of this is for using Geo / world coordinates.

Also, Keith Medcalf (kmedcalf) and Ryan Smith (cuz) makes a point about accuracy, but I'm not sure I grasp the implications for int32/float32 (which may only have 23 bits for -180/+180 lng and +90/-90 lat?)? ... I think the take away is still that one gets more accuracy by using RTREE_i32.

... and, it sounds like you get good (down to a few mm?) coverage of the entire world (lat/lng) using an RTREE_i32 (with multiplying/dividing by 10,000,000 (or some other number) when one writes/reads a value from the RTREE table.

(19) By Ryan Smith (cuz) on 2022-06-23 16:11:31 in reply to 18.3 [link] [source]

I think you have most of it, let me try be very clear on the finer points:

RTREE_i32 seems to use INT32 as it's storage format rather than Float32. (Note that I have not used the i32 version ever before, but from the suggestions it seems good).

That said, ALL of the storage stores exactly 32 bits, the difference is only in how those 32 bits are mapped. For instance a Float32 (IEEEE754) floating point format maps 1 sign bit, 23 bits used for the mantissa (significant digits) and 8 bits for the exponent. That is, only 23 of the bits really carry information on the mantissa or precision of the value, the other bits move the floating point or change the sign. 23 bits allow us 7 decimal digits of precision.

With a 32-bit Integer, there's still 1 sign bit, but all the other 31 bits carry precision information, allowing us 9 decimal digits of precision, and if we stick to below 2.1Bil, then we can use 10 digits.

What Richard suggests is that you could use the RTREE index to simply narrow down your search and then look up the actual (very precise) values in an adjacent table. i.e. use the RTREE as an index next to your data table. This means you can save lots of other fields also, and as high precision as you want, while still having superfast lookups using the index (even if it may hit one or two extra references), so this brings a lot of advantages, but with more data and an extra step in all lookups.

What Dominique suggests is that knowing the the underlying value is indeed 32 bits, design a translation function that converts the stored 32 bits to another more accurate format than FLoat32, much like a type-cast, but of your own design. More precisely, using the i32 version of RTREE in which all 32 bits (1 sign + 31 data) in storage are already in a format that is clear and easy to work with (Integer) and for the conversion function simply multiply the coordinate by 10⁷ (or 10,000,000) to ensure you push all 7 useful significant LAT/LON decimals into the full Integer value, which, as long as you stay under 2.1 Billion (which +/-1.8Billion does), will fit snugly in those 32 bits.

Of course the conversion goes both ways, so to get the RTREE data value for Longitude of 179.1234567, you x10⁷ to get 1791234567, and to see the Longitude for integer that value, simply divide again by 10⁷.

This way you will have pinpoint accuracy (better than 3cm) over the total of Earth's surface at the average ground-level, but you will need those extra conversions everywhere you reference or add the data points.

Note that the accuracy is a rough estimation because the area marked out by 4 LAT/LON coordinates will grow in size with height, because projection lines from the center of a sphere will forever grow further apart as they continue outward, ditto for the quoted accuracy - it gets better as you come closer to ground/sea-level, and better still as you go underground.

I hope that explains it - Feel free to ask if any of this is still unclear.

(20) By greg (greglearns) on 2022-06-25 13:40:48 in reply to 19 [link] [source]

THANK YOU! That is exactly what I was looking for. To summarize:

IF you are okay with "casting" your LAT/LON values into an 32-bit Integer, by multiplying each by 10^7 (10,000,000) when you insert into the Sqlite RTREE_32 Table, and dividing by 10^7 when you read the values out...

THEN you get more usable accuracy in your LAT/LNG values (31 bits (10 significant digits) vs 23 bits (7 significant digits)).

AND, given that all LAT/LNG values (as long as you are not reaching into space...) can be represented in 31 bits comfortably

THEN 32-bit Integers (RTREE_32) sounds like a win every time to me.

... Unless ...

I guess there is one outstanding question, which is given that RTREE_32 translates to 32-bit floats (21 bits, 7 significant digits)... how does that affect things? It seems like that will mess things up...

(22) By Ryan Smith (cuz) on 2022-06-25 15:39:20 in reply to 20 [link] [source]

I think RTREE_i32 uses 32bit-INTEGER natively (hence the name), and I am not sure by which extent it simply maps to 32 bits or have different tooling to specially handle 32 bit integers, but none of that should matter to you, from your point of view, you will convert LAT/LON to integer by multiplying it by 10^7 and divide by the same to convert back, you can trust the RTREE to do its thing correctly.

PS: Restating an earlier admission: I have not used the i32 version of RTREE yet, so perhaps it will be good for you to try the methods we've discussed, and just do some proof-of-concept tests to prove to yourself it works, before building the entire system on the premise.

(23) By ddevienne on 2022-06-27 07:12:21 in reply to 20 [link] [source]

given that RTREE_32 translates to 32-bit floats [for calculations], how does that affect things?

It can trigger false positives or false negatives, when the large integers are converted back to floating-point, thus losing a bit of the precision.

So there's an additional burden to enlarge or shrink your min/max-boundaries for example,
to make sure you only get the kind you want (false positives or false negatives),
and double-check the results yourself using the full precision you want.

I.e. RTree narrows down the search space, and you retest it with more precise calculations of your own (which is easy with just range-tests).

And you need to do that only if you really care about precision.

(24) By greg (greglearns) on 2022-06-27 21:42:24 in reply to 23 [link] [source]

Ok, to summarize this conversation:

I'm guessing (?) that the takeaway from the fact that RTREE_32 uses 32-bit floats for all computations means that there is no benefit to using RTREE_32 over RTREE in terms of increased accuracy or resolution. Therefore, there isn't much point in using RTREE_32 and doing the multiply/divide by 10,000,000 if your data isn't already in INTEGER format... which means, just use the standard RTREE (32-bit floats).

So, the rules (now) sounds like: use RTREE with 32-bit floats since converting to RTREE_32 doesn't get you any extra benefit.

Correct?

(25) By Ryan Smith (cuz) on 2022-06-28 00:26:26 in reply to 24 [link] [source]

I think you have missed some crucial bit of information...

RTREE_i32 (note the "i" before the 32) is a native INTEGER RTREE format.

In contrast, the usual "RTREE" module (sans any other nomenclature) is a native FLOAT_32 format. There is no "RTREE_32" module.

You can use the original RTREE (using Float32 "single precision" storage format) with the caveats of diminished accuracy as explained before,

-- OR --

You can use the alternative RTREE_i32 module with its INTEGER native storage by means of converting your float values (as LAT/LON is want to be) with the multiplication trick discussed previously, to change between the actual LAT/LON values, and the INTs that will be stored in the RTREE_i32 table.

The "extra benefit" in using this second method is that you get full accuracy (<3cm on Earth's surface) over the full range of degrees (-180 to 180 LON, -90 to 90 LAT), but it comes at the price of having to multiply LAT/LON values to convert to integer before adding it to the table (or formulating a query), and when reading values from the table (which will be INTs), having to divide it to convert it back again.

That's about as clear as I can be, and hoping it clarifies the conversation some.

(26) By greg (greglearns) on 2022-06-28 01:44:07 in reply to 25 [link] [source]

Ryan, when I was writing "RTREE_32", I meant to be writing "RTREE_I32", so I'm aware that RTREE_I32 stores values as 32-bit integers. My apologies for missing that and causing confusion!

I'm trying to understand the implication of the second sentence in the RTREE documentation:

"An rtree_i32 stores coordinates as 32-bit signed integers. EVEN THOUGH IT STORES VALUES USING INTEGER, THE RTREE_I32 VIRTUAL TABLE STILL USES FLOATING POINT COMPUTATIONS INTERNALLY AS PART OF THE R-TREE ALGORITHM.

If it's using floating point computations internally, then doesn't that mean that it will have to lose the extra accuracy of the 32-bit integers, essentially rendering the 32-bit integers the same as a 32-bit float?

(27) By Richard Hipp (drh) on 2022-06-28 01:49:22 in reply to 26 [link] [source]

IIRC, it uses 64-bit (double) floating point values for internal computations. I think that is sufficient to preserve the full 32-bits of RTREE_I32.

(30) By ddevienne on 2022-06-28 08:05:53 in reply to 27 [link] [source]

That would be a game-changer, as indeed a double preserves an 32-bit int precision.
Reading the doc, I assumed the code was using float, but if it really is double, I think the doc ought to state that explicitly.

(28) By Larry Brasfield (larrybr) on 2022-06-28 02:57:40 in reply to 26 [link] [source]

Confirming Richard's claim that doubles preserve 32-bit integer values: The 64-bit IEEE-754 double used on modern hardware uses 53 of its 64 bits to store the sign and magnitude, which is adequate to store any 32-bit integer without losing precision.

(31) By Ryan Smith (cuz) on 2022-06-28 09:42:39 in reply to 26 [link] [source]

Thank you for clarifying, I see now your confusion.

Also thank you Richard and Larry for clarifying, and may I also +1 Dominique's suggestion of stating the 64-bitness of internal calculations in the RTREE_i32 module explicitly so this erroneous assumption can be avoided.

(33) By greg (greglearns) on 2022-06-28 20:08:49 in reply to 31 [source]

Thank you all, this was exactly what I was looking for! And I agree that adding to the documentation that internally it uses 64-bit floats is a great idea.

(21.1) By Harald Hanche-Olsen (hanche) on 2022-06-25 15:05:20 edited from 21.0 in reply to 19 [link] [source]

Fun factoid, probably not relevant here, but anyhow: Handheld Garmin GPS receivers use(d) signed 32 bit integers for latitude/longitude internally, with the 32 bits exactly wrapping the circle, covering the range from −180° exactly to just a tad short of +180°.

(This information is years old, from memory – so apply a grain of salt.)

(29) By Keith Medcalf (kmedcalf) on 2022-06-28 04:16:01 in reply to 21.1 [link] [source]

That would make each unit in the integer equal to 4.190951586745435e-08 degrees.

That is int(deg / 4.190951586745435e-08) where -180 <= deg <= 180

That would preserve about 6 or 7 decimal digits of precision.

The conversion constant C = (180.0/(2**32-1.0))

int = deg / C
deg = int * C

(32) By Keith Medcalf (kmedcalf) on 2022-06-28 17:00:30 in reply to 29 [link] [source]

That should be 2**31-1, of course, for a C of 8.38190317544243e-08

That means that -180 degrees is: -2147483647
and 180 degrees is: 2147483647

Still 6 or 7 digits of decimal degrees precision is maintained (about 1 cm resolution at the equator).