decimal precision

(1) By anonymous on 2022-12-13 09:38:58 [link] [source]

Hi,

I see that when handling queries with decimal numbers the precision may cause irregularities. for example the query 'SELECT value from foo where value = 100.00000000000000001' will return a row where the value is 100. What is the maximum precision supported?

Thanks

(2) By Tim Streater (Clothears) on 2022-12-13 10:32:24 in reply to 1 [link] [source]

If you look here:

   https://www.sqlite.org/datatype3.html

you'll see that SQLite has a 'real' type and it's "an 8-byte IEEE floating point number". I expect Wikipedia or similar can then help you further.

(35) By jose isaias cabrera (jicman) on 2023-08-08 14:45:46 in reply to 2 [link] [source]

I know this is a loaded question, and I duckduckgo'ed it, but couldn't really find an answer. Although, lots of hits found and lots of explanation, but no real numbers comparing both. I know it's somewhere, but perhaps one of you already know the answer: how much faster, really, is using IEEE floating point numbers than the normal real numbers? For instance, how much faster is multiplying 106310324.37 x 106310324.37 using IEEE floating point number vs normal real number?

(36) By Richard Hipp (drh) on 2023-08-08 15:11:23 in reply to 35 [link] [source]

By "normal real numbers", do you mean the numbers computed using the decimal extension to SQLite? If that is what you mean, then the answer is probably thousands of times faster. Maybe tens of thousands or hundreds of thousands of times faster. I dunno because I have not measured.

You can probably write an arbitrary-precision floating point package that is much faster than the decimal extension. Part of the design goal of decimal.c is to be reasonably small and simple so that can be more easily verified as correct. More sophisticated algorithms could be used which are a lot faster, but which are also more complex and require more code and are more likely to contain bugs (at least if written by me). The purpose of the decimal.c extension is to be precise and correct. Speed is not a consideration.

But even the fastest arbitrary-precision floating point package will still be substantially (hundreds of times?) slower than the IEEE 754 binary64 floating point that it baked into the silicon of all modern processors.

(37) By jose isaias cabrera (jicman) on 2023-08-08 15:28:33 in reply to 36 [link] [source]

... the answer is probably thousands of times faster. Maybe tens of thousands or hundreds of thousands of times faster. I dunno because I have not measured.

WOW! That is fast. Perhaps there should be a small example table with some actual comparisons of "SQLite Decimal extension" vs. "IEEE" showing the differences. This can be of help when people ask this question. I know it has come up a lot of times. Just thinking out-loud. :-) Thanks for that answer.

(3) By Tim Streater (Clothears) on 2022-12-13 10:34:38 in reply to 1 [link] [source]

Oh, and I forgot to add: trying to directly compare floating point numbers in a computer is a fundamental mistake. One which I learnt in 1965.

(5) By Chris Locke (chrisjlocke1) on 2022-12-13 13:04:28 in reply to 3 [link] [source]

1965? I learned it in 1964.9999999997. That was a rough year. Almost as bad as 1972.99999999999999999998

(6) By ddevienne on 2022-12-13 13:45:55 in reply to 3 [link] [source]

Not really. Trying to compare the result of floating-point computations is a mistake.

But comparing floating point numbers after a round-trip to some persistence or transmission mechanism, which qualifies for SQLite, is definitely NOT a mistake in my book.

We're all working with databases in general, and SQLite in particular, to accurately store and retrieve exactly what we put in.

But sure, results of SQL or any calculation on FPs is subject to approximation. But once you do have an IEEE FP of some size, SQLite exactly preserves it, as long as it is at most 64-bit.

(4) By Stephan (stephancb) on 2022-12-13 11:17:00 in reply to 1 [link] [source]

For handling decimal numbers with high precision there is the decimal.c extension :

sqlite> select decimal_cmp(100.00000000000000001,100);
1

i.e. SELECT value from foo where decimal_cmp(value, 100.00000000000000001)==0; would not return rows where the value is 100.

(33) By anonymous on 2023-08-07 22:02:09 in reply to 4 [link] [source]

But why does this happen? Shouldn't these be equal?

sqlite> select decimal_cmp(100.00, 100);
1

(34.1) By Adrian Ho (lexfiend) on 2023-08-08 11:45:59 edited from 34.0 in reply to 33 [link] [source]

Internally, Decimals are stored as their textual representation, so you need to ask "how is SQLite parsing the numbers you give it, and what are their textual representations?"

sqlite> select cast(100 as text);
100                     -- INTEGER 100
sqlite> select cast(100.00 as text);
100.0                   -- REAL 100.00
sqlite> select cast(100.00000000 as text);
100.0                   -- REAL 100.00000000
sqlite> select cast('100.00000000' as text);
100.00000000            -- TEXT 100.00000000

So 100 (integer) is stored with 0 fractional digits, but 100.00 (real) is stored with 1 fractional digit (that happens to be zero). Decimal-wise, the two are NOT the same.

It should be clear now that Stephan's original example isn't comparing what you think it is. Instead, it's incorrect in two different ways, but just happens to return the expected result:

sqlite> select cast(100.00000000000000001 as text);
100.0                   -- oops, rounding representation
sqlite> select decimal_cmp(100.00000000000000001,100);
1                       -- comparing 100.0 to 100

In case it still isn't clear where the gotchas are:

sqlite> select cast('100.00000000000000001' as text);
100.00000000000000001   -- ah, that's more like it
sqlite> select decimal_cmp(100.00000000000000001, 100.0000000);
0                       -- comparing 100.0 to 100.0
sqlite> select decimal_cmp('100.00000000000000001', 100.0000000);
1                       -- comparing 100.00000000000000001 to 100.0
sqlite> select decimal_cmp('100.00000000000000001', '100.0000000');
1                       -- comparing 100.00000000000000001 to 100.0000000

And from the other side:

sqlite> select cast(99.99999999999999999 as text);
100.0
sqlite> select cast('99.99999999999999999' as text);
99.99999999999999999
sqlite> select decimal_cmp(99.99999999999999999, 100.0000000);
0
sqlite> select decimal_cmp('99.99999999999999999', 100.0000000);
-1

And now that you know what's going on under the hood, this should not come as a surprise:

sqlite> select decimal_cmp('100.00', '100.0000000');
-1
sqlite> select decimal_cmp('100.001', '100.0010');
-1
sqlite> select decimal_cmp(100.001, 100.0010);
0

The Decimal extension is great if you're careful to always deal with canonical representations of numbers ('100.00' != '100.00000' but 100.00 == 100.00000). If your input data is slipshod, don't be surprised at garbage output.

(7.1) By Keith Medcalf (kmedcalf) on 2022-12-13 16:09:19 edited from 7.0 in reply to 1 [link] [source]

The next larger IEEE-754 double precision binary floating point number than 100.0 is:

sqlite> select format('%!.26f', 100.0 + epsilon(100.0));
┌──────────────────────────────────────────┐
│ format('%!.26f', 100.0 + epsilon(100.0)) │
├──────────────────────────────────────────┤
│ '100.00000000000002220446049'            │
└──────────────────────────────────────────┘

according to MSVC, but MinGW64 (GCC) is more exact:

sqlite> select format('%!.26f', 100.0 + epsilon(100.0));
┌──────────────────────────────────────────┐
│ format('%!.26f', 100.0 + epsilon(100.0)) │
├──────────────────────────────────────────┤
│ '100.00000000000001421389048'            │
└──────────────────────────────────────────┘

Interestingly, nextafter(100.0) on MSVC is the same as 100.0 + epsilon(100.0), but for some reason my MinGW64 compiler does nextbefore when calling nextafter. I will have to poke through the headers and see what it thinks it is doing (though note that it returns 100.0 - epsilon(100.0) so it sees the same interval to the next representable number).

NOTE nextafter is actually implemented as if it was nexttoward. The MSVC compiler treats a missing "towards" as INFINITY, MinGW64 treats it as either 0 or -INFINITY. So I fixed the definition in the UDF to always specify towards INFINITY.

For MSVC:

sqlite> select format('%!.26f', nextafter(100.0));
┌────────────────────────────────────┐
│ format('%!.26f', nextafter(100.0)) │
├────────────────────────────────────┤
│ '100.00000000000002220446049'      │
└────────────────────────────────────┘

and for MinGW32 (gcc 12.2.0):

sqlite> select format('%!.26f', nextafter(100.0));
┌────────────────────────────────────┐
│ format('%!.26f', nextafter(100.0)) │
├────────────────────────────────────┤
│ '100.00000000000001421389048'      │
└────────────────────────────────────┘

(12) By jchd (jchd18) on 2022-12-14 08:43:06 in reply to 7.1 [link] [source]

Keith,

MSVC is surprisingly wrong indeed.

I wrote a small program to display FP values around a given FP point. FP values below are computed from FP mantissa bits with an arbitrary precision decimal library.

Input value  = 100
FP (hex)     = 0x4059000000000000
Sign         = +
Exponent     = 6
Scaling      = 2^6 = 64
Mantissa     = 1/2 + 1/16

Nearby exact double values (exact computation from FP bits)
PrePrevious  = +99.999999999999971578290569595992565155029296875
Previous     = +99.9999999999999857891452847979962825775146484375
Value        = +100
Next         = +100.0000000000000142108547152020037174224853515625
NextNext     = +100.000000000000028421709430404007434844970703125

The ULPs (Unit of Least Precision) around 100 is -1.4210854715202e-14 +1.4210854715202e-14

(13) By Harald Hanche-Olsen (hanche) on 2022-12-14 09:43:02 in reply to 7.1 [link] [source]

What are these epsilon and nextafter functions you use here? I mean, it's obvious what they do, but where are they from? A private extension you wrote? They do look useful, for debugging and exploration (and education) purposes.

(14) By ddevienne on 2022-12-14 09:47:43 in reply to 13 [link] [source]

See https://sqlite.org/forum/forumpost/1ee3147fd786fa13e2be9c5ce01c49ff3271b40fbaeeba1e0facab45745a38c6

(15) By jchd (jchd18) on 2022-12-14 12:39:23 in reply to 13 [link] [source]

My program invokes _nextafter() from msvcrt.dll but I don't use epsilon.

I suspect there is a difference between my use of _nextafter and Keith' use: for the last argument I don't use Infinity or zero.

Here is how I invoke this function, which should be self-explanatory:

Func _NextToward($dbl)
	Return DllCall("msvcrt.dll", "double:cdecl", "_nextafter", "double", $dbl, "double", $dbl * 0.999999999999)[0]
EndFunc   ;==>_NextToward

Func _NextAfter($dbl)
	Return DllCall("msvcrt.dll", "double:cdecl", "_nextafter", "double", $dbl, "double", $dbl * 1.000000000001)[0]
EndFunc   ;==>_NextAfter

Multiplying the value by constants close to 1 seems to give good results over a large range but it's quite possible that things break down in corner cases. My use case was for educating unsuspecting novice programmers or FP debugging.

(8) By Gunter Hick (gunter_hick) on 2022-12-13 16:49:34 in reply to 1 [link] [source]

You need to think hard about what kind of numbers you have, what kind of operations you want to perform on them and what they represent.

The common way of handling "money" for addition and subtraction is to store integers that represent the minor currency unit (e.g. cents when dealing with dollars). This avoids any rounding errors. If you need to perform division, then you will need to add guard digits and perform rounding according to the proscribed method. This may involve doing starting off with all the multiplications, additions abd subtractions and doing divisions last, because 17% of a sum of values may not equal the sum of 17% of the values.

Floating point is always an approximation. This is good for scientific number crunching, but you also have to be aware of where quantization errors could slip in (e.g. adding values in ascending order to maximize precision)

(10) By Keith Medcalf (kmedcalf) on 2022-12-13 17:24:03 in reply to 8 [link] [source]

The common way of handling "money" for addition and subtraction is to store integers that represent the minor currency unit

This is not only untrue but doing so is a violation of the law and would subject the person who did this and the person(s) who approved it to life imprisonment for fraud if the data so manipulated was intended for "external" disemination and not simply for "internal entertainment value".

The legal requirement is that "dollars" be stored to 1/100th of a penny. This has been the case since the advent of mechanical/electrical/electronic tabulating equipment.

It is also the reason why the standard "money" type is a fixed decimal with four places type (that is usually emulated with scale restricted floating point). See MS SQL Server for a johhny-cum-lately example.

binary16 unsuitable for any money use
binary32 absolute value must never exceed $1e3
binary64 absolute value must never exceed $1e11
binary128 absolute value must never exceed $1e29
binary256 absolute value must never exceed $1e67

and so on and so forth.

(11) By Gunter Hick (gunter_hick) on 2022-12-14 06:36:23 in reply to 10 [link] [source]

Ok, so that is the "proscribed method" in your jurisdiction.

(16) By Simon Slavin (slavin) on 2022-12-14 13:01:13 in reply to 10 [link] [source]

I am not disputing your statement, I'm just interested in it. Do you have a cite for

The legal requirement is that "dollars" be stored to 1/100th of a penny

(20) By Keith Medcalf (kmedcalf) on 2022-12-14 15:40:40 in reply to 16 [link] [source]

I would suggest contacting what we call a "Chartered Bank" and ask them the question. If you manage to actually get hold of someone who knows what they are talking about you will find the answer very informative.

You might also ask an certified accountant, however, they are unlikely to know since they do not actually "compute" correct results, only verify them.

(21) By Keith Medcalf (kmedcalf) on 2022-12-14 15:45:02 in reply to 16 [link] [source]

I believe IBM first added the guard digits to their computations in the late 50's or early 60's. Multiplication (in respect of money) was rarely performed by machine prior to this. Although when I was a kid one of by friends dad owned and ran a textile mill and they had all sorts of nifty mechanical tabulating machines including a "mechanical computer" that could multiply (it took many turns of the hand-crank but it would do it accurately).

(22) By Tim Streater (Clothears) on 2022-12-14 16:12:35 in reply to 21 [link] [source]

In that context, this may interest you:

https://en.wikipedia.org/wiki/LEO_(computer)

(24) By Keith Medcalf (kmedcalf) on 2022-12-14 16:37:27 in reply to 22 [link] [source]

Insteresting. The thing I am talking about however was clearly a tabulator. It was about 2 feet by 1 foot and about 8 inches thick.

THe "input" consisted of about twelve columns of mechanical keys labelled 0 though 9 and another column that contained "operation" keys. Along the top edge was a series of mechanical wheels emblozened each with the numbers 0 through 9.

One entered the operation and the operand by depressing the appropriate keys. Once the entry was made, you would turn the crank until all the depressed keys "popped back up" and the "display wheels" displayed the number that you had input.

You then pressed the "compute" button and turned the crank round and round and round until the thing dinged at which point the number on the display was the answer to your "incremental operation".

You could then enter another "input" set, and incrementally compute the next result. Carry on. Some operations were completed with just one or two rotations of the hand crank. Some (such as multiplication) required a number of turns proportional to the sum of the digits of the multiplicand.

This particular machine had been replaced by one which used an "electric motor" to do the cranking rather than "operator hand-crank power".

(9) By cj (sqlitening) on 2022-12-13 17:00:54 in reply to 1 [source]

12-digits using real in this test.
Saved "as is" when wrapped in single quotes and cast as blob. 

drop table if exists t;
create table t(c real);
insert into t values('100.000000000001');
insert into t values(CAST('100.00000000000000001' AS BLOB));
select c,typeof(c) from t;

(17) By jose isaias cabrera (jicman) on 2022-12-14 14:42:08 in reply to 9 [link] [source]

But, there is some data loss using BLOB:

drop table if exists t;
create table t(c real);
insert into t values('100.000000000001');
insert into t values(CAST('100.00000000000000001' AS BLOB));
insert into t values('100.000000000001');
select c,typeof(c) from t;
sqlite> select total(c) from t;
┌──────────────────┐
│     total(c)     │
├──────────────────┤
│ 300.000000000002 │
└──────────────────┘

(18.1) By Keith Medcalf (kmedcalf) on 2022-12-14 15:30:33 edited from 18.0 in reply to 17 [link] [source]

That is because it is an excercize in lunacy.

Just because you change the "type code" from "null terminated bytes of ASCII text representing a number as a text string" into "a fixed length bag of bytes of ASCII text representing a number as a text string but without the null terminator" does not change anything at all.

There is no difference in the bytes contained in the null terminated text string 'hello baby' that there is in cast('hello baby' as blob) except that in the former case there is a null terminator (it is a C string) and in the latter it is a pascal string. The "bytes" are exactly the same.

sqlite> select hex('hello baby'), hex(cast('hello baby' as blob));
┌────────────────────────┬─────────────────────────────────┐
│   hex('hello baby')    │ hex(cast('hello baby' as blob)) │
├────────────────────────┼─────────────────────────────────┤
│ '68656C6C6F2062616279' │ '68656C6C6F2062616279'          │
└────────────────────────┴─────────────────────────────────┘

(19) By Keith Medcalf (kmedcalf) on 2022-12-14 15:29:27 in reply to 17 [link] [source]

That is to say that in both cases the text string is cast to a IEEE-754 double and then passed to the "total" aggregate which adds them up, and returns the IEEE-754 double result.

There would be no difference whatsoever if the values were the "C-style (null terminated) text representation of a floating point number", the "pascal style (length bound) text representation of a floating point number", or actual floating point numbers directly. In all cases the inputs are "converted" to IEEE-754 double precision floating point numbers so they can be added up by the total aggregate function which returns an IEEE-754 double precision floating point result.

Converting them to text and blob serves no purpose other than wasting CPU and memory for a futile endeavour.

(23) By jose isaias cabrera (jicman) on 2022-12-14 16:37:24 in reply to 19 [link] [source]

Converting them to text and blob serves no purpose other than wasting CPU and memory for a futile endeavour.

I agree. But, for those that want to convert the data to BLOB, they should know that there is data loss in that exchange:

drop table if exists t;
create table t(c real);
insert into t values(CAST('100.00000000000000001' AS BLOB));
insert into t values(CAST('100.00000000000000001' AS BLOB));
insert into t values(CAST('100.00000000000000001' AS BLOB));
select c,typeof(c) from t;
sqlite> select total(c) from t;
┌──────────┐
│ total(c) │
├──────────┤
│ 300.0    │
└──────────┘

drop table if exists t;
create table t(c real);
insert into t values('100.000000000001');
insert into t values('100.000000000001');
insert into t values('100.000000000001');
select c,typeof(c) from t;
sqlite> select total(c) from t;
┌──────────────────┐
│     total(c)     │
├──────────────────┤
│ 300.000000000003 │
└──────────────────┘

(25) By Keith Medcalf (kmedcalf) on 2022-12-14 16:46:05 in reply to 23 [link] [source]

Of course the answers are different, the question is different.

'100.00000000000000001'
'100.000000000001'

when converted to IEEE-754 double precision floating point are different numbers. Therefore it is only natural to expect that the result of "adding each of them converted to IEEE-754 floating point 3 times" would yeild a different result.

Just like adding 2 + 2 + 2 gives a different result than adding 42 + 42 + 42.

(27) By cj (sqlitening) on 2022-12-14 16:55:29 in reply to 25 [link] [source]

drop table if exists t;
create table t(c real);
insert into t values(.00001);
insert into t values(.00001);
insert into t values(.00001);
select total(c) from t; 

// 3.0e-05

I never use floating point numbers, but was surprised by this answer.

(28.3) By Keith Medcalf (kmedcalf) on 2022-12-14 19:03:02 edited from 28.2 in reply to 27 [link] [source]

Why would you be surprised? It has been formatted for your convenience and, in this case, that convenience happens to coincide with your perception of reality.

sqlite> drop table if exists t;
sqlite> create table t(c real);
sqlite> insert into t values(.00001);
sqlite> insert into t values(.00001);
sqlite> insert into t values(.00001);
sqlite> select total(c) from t;
┌──────────┐
│ total(c) │
├──────────┤
│ 3.0e-05  │
└──────────┘
sqlite> select format('%!.26f', c) from t;
┌────────────────────────────────┐
│      format('%!.26f', c)       │
├────────────────────────────────┤
│ '0.00001000000000000000081748' │
│ '0.00001000000000000000081748' │
│ '0.00001000000000000000081748' │
└────────────────────────────────┘
sqlite> select format('%!.26f', total(c)) from t;
┌────────────────────────────────┐
│   format('%!.26f', total(c))   │
├────────────────────────────────┤
│ '0.00003000000000000000414815' │
└────────────────────────────────┘
sqlite>

Note that the "exactly rounded" result should be:

sqlite> select format('%!.26f', sigdigits(total(c))) from t;
┌───────────────────────────────────────┐
│ format('%!.26f', sigdigits(total(c))) │
├───────────────────────────────────────┤
│ '0.00003000000000000000075894'        │
└───────────────────────────────────────┘
sqlite> select ulps(sigdigits(total(c)), total(c)) from t;
┌─────────────────────────────────────┐
│ ulps(sigdigits(total(c)), total(c)) │
├─────────────────────────────────────┤
│ -1.0                                │
└─────────────────────────────────────┘

Which is a diference of 1 ULP.

Note that the sum does not equal what is "printed for convenience":

sqlite> select total(c) == 3.0e-05 from t;
┌─────────────────────┐
│ total(c) == 3.0e-05 │
├─────────────────────┤
│ 0                   │
└─────────────────────┘

because the values differ by 1 ULP.

(29) By jose isaias cabrera (jicman) on 2022-12-14 18:41:18 in reply to 28.1 [link] [source]

Any other questions? :-) Thanks Keith. I am glad you're in this forum.

(30) By cj (sqlitening) on 2022-12-14 20:43:59 in reply to 27 [link] [source]

drop table if exists t;
create table t(c real);
insert into t values(.00001);
insert into t values(.00001);
insert into t values(.00001);
select format('%.5f', total(c)) from t
// 0.00003 instead of 3.0e-05

(31.1) By Keith Medcalf (kmedcalf) on 2022-12-14 21:05:39 edited from 31.0 in reply to 30 [link] [source]

What is your point?

sqlite> drop table if exists t;
sqlite> create table t(c real);
sqlite> insert into t values(.00001);
sqlite> insert into t values(.00001);
sqlite> insert into t values(.00001);
sqlite> select format('%.4f', total(c)) from t;
┌──────────────────────────┐
│ format('%.4f', total(c)) │
├──────────────────────────┤
│ '0.0000'                 │
└──────────────────────────┘
sqlite>

How you "format the displayed value" for printing has nothing whatsoever to do with its value. Just because you print it with the format %.5f does not mean that the value is 0.00003 anymore than printing it with the format %.4f means that the value is 0.0000.

Notwithstanding how you print it, the "exactly rounded IEEE-754 binary64 value" to 26 base-10 decimal places, is 0.00003000000000000000075894.

If you were very lucky and carried out all your operations in extended precision and then then rounded down to a double at the end (ie, performed the computation "exactly rounded" would you get the result 0.00003 "exactly rounded".

Rather the computation is carried out as the sum of the "exactly rounded" values, which has a a result that is different from the results that would have been obtained if one did the computation exactly, and then rounded. And that difference is, in this case, 1 ULP.

(32) By cj (sqlitening) on 2022-12-14 22:37:29 in reply to 31.1 [link] [source]

To display the correct total requires 5-decimal positions.
The correct result   0.00003 is displayed using '%.5f'
The incorrect result 0.0000  is displayed using '%.4f'

drop table if exists t;
create table t(c real);
insert into t values(.00001);
insert into t values(.00001);
insert into t values(.00001);
select format('%.5f', total(c)) from t;
0.00003

(26) By Keith Medcalf (kmedcalf) on 2022-12-14 16:49:19 in reply to 23 [link] [source]

See:

sqlite> select format('%!.26f', 100.00000000000000001) as value union all select format('%!.26f', 100.000000000001);
┌───────────────────────────────┐
│             value             │
├───────────────────────────────┤
│ '100.0'                       │
│ '100.00000000000099475549325' │
└───────────────────────────────┘