Decimal128
(1) By naklers on 2020-06-19 10:44:04 [link] [source]
Hi,
I made a quick search I couldn't find any topic related to Decimal128, so here we go:
Decimal128 (https://en.wikipedia.org/wiki/Decimal128_floating-point_format), not natively supported. I could store the number as text, and search by casting to REAL, but that means I would lose precision, which means the search could yield invalid results.
What could be a workaround for this?
Thanks, Juan
(2) By Warren Young (wyoung) on 2020-06-19 16:25:29 in reply to 1 [link] [source]
What could be a workaround for this?
- Sponsor the feature
- Use sqlite3decimal
- Strap an arbitrary precision arithmetic library to your app and store the library’s data structure as a BLOB
(3) By Warren Young (wyoung) on 2020-06-19 17:13:45 in reply to 2 [link] [source]
Also, this is an XY Problem. Instead of telling us what the solution must be, tell us what your actual problem is and let the forum's brain trust suggest better solutions.
Chances are excellent that you do not actually need 34 digits of precision. That's roughly sufficient for universe-scale problems at the precision of dust motes. That might sound good until you realize that no computer on the planet can usefully run a program big enough to do anything useful with that sort of model.
In other words, Decimal128 is stupid-big for most purposes. There's probably a better solution for your problem, if only we knew what it was.
(4) By naklers on 2020-06-19 17:23:56 in reply to 3 [link] [source]
Thanks Warren for the suggestions, and you are right. I can hardly imagine how this is useful in the business domain but unfortunately it is not my decision..
(5) By Warren Young (wyoung) on 2020-06-19 17:37:59 in reply to 4 [link] [source]
Why do you believe you have no leverage?
Go back to the people making these presumably-uninformed decisions and say, "SQLite has no Decimal128 data type. Given that you aren't storing such data in SQLite now, you cannot claim there is a legacy of data in SQLite that we still need to access. Short of paying someone to add Decimal128 to SQLite, we're going to have to do a data transform if we want to store the data in SQLite. Therefore, let us reconsider what the requirements are on our data type, to better match our actual needs."
Inference: you have some other DBMS that does provide a Decimal128, and those in charge of that DBMS said, "Yay, all the precision we could possibly want!" and went with that without thinking about what they actually needed.
TANSTAAFL, and that's the soul of engineering.
(14.1) By naklers on 2020-06-20 13:26:19 edited from 14.0 in reply to 5 [link] [source]
Thanks Warren. It's complicated :) We are trying to replicate server data on mobile devices, and the server has been supporting Decimal128 for some time now...
(6) By Warren Young (wyoung) on 2020-06-19 18:42:58 in reply to 3 [link] [source]
That's roughly sufficient for universe-scale problems at the precision of dust motes.
It's Friday here, so let's have a bit of fun. Given:
- a limit on the size of the visible universe approximately 100 billion light years in diameter
- household dust size approximately in the range of 1 to 100 µm [Source]
- 34 decimal digits in the significand for Decimal128 [Source]
Then: At approximately 10 trillion kilometers to the light-year, 100 × 1012 ly ≅ 1027 km ≅ 1030 m ≅ 1036 µm.
Therefore, we can precisely localize large (100 µm) dust particles at this scale.
Oh, and having done all of this, we still have a −6143 to +6144 power-of-10 scaling constant in addition to all of this, so a quick estimate says we could instead accurately describe the position of oxygen atoms at the scale of a galaxy, or the position of individual quarks at the scale of the cosmic neighborhood.
Now please explain again why you need Decimal128? 😜
(15.1) By naklers on 2020-06-20 13:25:55 edited from 15.0 in reply to 6 [link] [source]
I'm sure going to pass on this information :D
(7) By Richard Damon (RichardDamon) on 2020-06-19 23:50:22 in reply to 3 [link] [source]
Actually, it may not be as out of line if you think a bit differently. If you can show, or reasonably believe that Float64 representation is insufficient (and there are numerous spots where that can be true), then it isn't illogical to think of stepping up to a 128-bit representation (since they do tend to come in powers of two), If the domain is Financial, (or something else that deals with exact decimal fractions) then thinking of at the same time switch to a decimal representation makes sense. Decimal64 MIGHT work, but world-scale numbers start to come very close to running out of precision (you only get 16 digits), especially if doing calculations that need fractions of pennies for things like interest calculation.
And Decimal64 isn't that much of a help, as SQLite doesn't support it either (at least out of the box).
You don't need the full resolution of the new format to be reasonable to need, you just need to show that the next option down is possibly insufficient. Maybe a less standard format more akin to Float/Decimal-80/96 might make sense, but then are you backing yourself into a corner with something not as standard.
(9) By Warren Young (wyoung) on 2020-06-20 02:07:02 in reply to 7 [link] [source]
...Float64 representation is insufficient...
Separate issue. If someone wants to start advocating for extended precision floats, then I wouldn't oppose that at all, simply because it's built into the FPU on many processors.
Whether doing so is advisable is a separate matter, since it's likely to make your DB non-portable. Still, if you have a problem big enough that it requires extended precision FP arithmetic, you're probably sufficiently grown up to accept this limitation.
What we're discussing in this thread is a) going straight to 128 bits, rather than something more moderate like 80 or 96 bit; and b) decimal representation, which is better for countable things rather than measurable things, which brings us back to questions of the maximum number of tightly-packed dust particles in the visible universe.
If the domain is Financial...
...then something on the order of 1015 should suffice to account for all of the money in the world, to the precision of pennies.
...which no one bank actually does, there being more than one global-scale bank, so no one has all the marbles.
...a decimal representation makes sense [for financial applications]
Yes. But whether DECIMAL is good for money is an entirely separate matter from whether you need a 34-digit significand.
world-scale numbers start to come very close to running out of precision (you only get 16 digits)
Floating-point was made for measurable quantities, where 16 digits likely exceeds the accuracy of your instrument by orders of magnitude. Very few measurement instruments go to 9 digits. Most run out of useful precision in the 3-5 digit range.
Thus 32, 24, and 16-bit FP. They often suffice.
If you're doing discrete math with countable objects, then you probably shouldn't be using FP at all.
And Decimal64 isn't that much of a help, as SQLite doesn't support it either
Certainly, but it'd be much easier to make a case for having it than for Decimal128.
(17) By Richard Damon (RichardDamon) on 2020-06-20 13:59:41 in reply to 9 [link] [source]
What we're discussing in this thread is a) going straight to 128 bits, rather than something more moderate like 80 or 96 bit; and b) decimal representation, which is better for countable things rather than measurable things, which brings us back to questions of the maximum number of tightly-packed dust particles in the visible universe.
If you read the IEEE Standards document, there actually IS a big reason to jump to 128-bit formats. The 32 / 64 / 128-bit formats are described as 'Basic Floating-Point Formats' and formats and are fully specified. Other formats like 80 and 96 are recognized and allowed and are called extended Floating-Point formats, and are NOT fully defined. There are allowed several different versions of 'Float80' with differing allocations of bits between Exponent and Mantissa, with just the restriction of needing at least as many as Float64, and no more than Float128, thus, the 128-bit is the next truly standardized, and thus most portable format. (It might not be available, but if it is, it WILL work and not give crazy values)
Decimal format floating-point makes sense if for legal, or other purposes you want to exactly mimic how paper and pencil match would work with numbers with decimal fractions. SOMETIMES you can replace that with scaled fixed point, but sometimes that overcomplicates the algorithm, especially if at times you need to deal with decimal fractions of that basic unit. This is the basic historical reason that the Decimal Floating-Point Format was invented because sometimes people want to make computers really do things to get the same answer that we would if we did it by hand, and we use decimals. Yes, it shouldn't really matter if the World GDP isn't computed accurately to the penny, but sometimes people are illogical and make demands like that.
(18) By anonymous on 2020-06-20 15:37:03 in reply to 9 [link] [source]
and b) decimal representation, which is better for countable things rather than measurable things, which brings us back to questions of the maximum number of tightly-packed dust particles in the visible universe.
That's nonsense: If you are dealing with countable things, all you need is unsigned integers.
Decimal representation only comes into play if you insist on cutting up your countables into powers of ten. But there is no particularly good reason for that.
(20) By Richard Damon (RichardDamon) on 2020-06-20 17:26:15 in reply to 18 [link] [source]
Decimal representation only comes into play if you insist on cutting up your countables into powers of ten. But there is no particularly good reason for that.
Actually, the 'Good Reason' is if you want the computer calculation to match what you would get if you did it via paper and pencil (and then you also specify exactly how to round at each step). Decimal representation is designed (and was so developed) to mimic how we do the math 'by hand'.
Sometimes, this requirement/detail is specified in law or accept practice documents.
Think of how many times a beginner asks a question like way is 0.1 * 10.0 NOT 1.000000 and has to be taught about this 'issue' with binary floating-point. Except for backward compatibility reasons, the slight additional costs to do Decimal Floating-Point is small enough that if we were to start over, it could be argued that it would make more sense to use decimal floating-point rather than binary. The biggest impediment is the inertial in the current system. (I can't see very many application where the Decimal format doesn't meet the needs of something currently using the binary format, except the cost to switch).
(21.1) By Larry Brasfield (LarryBrasfield) on 2020-06-20 18:07:27 edited from 21.0 in reply to 20 [link] [source]
There are a couple of efficiency based reasons for binary floating point.
One is that, because the underlying representation in any "digital" hardware in common use is a collection of binary states, conventionally called 1 and 0 (but manifested as various voltage levels, charge absence/presence, magnetization polarity, and maybe some others. Decimal representations require more binary bits per possible represented state (by about 20 percent) because 10 is well below the next higher power of 2.
The other is that hardware designed to do arithmetic would have to be more complex to manipulate "decimal" representations. This is because they must themselves be represented as a small number of binary states and because decimal arithmetic is inherently more complex than binary arithmetic. (There are many more input and output values for the fundamental digit operations.) And that additional complexity, while quite manageable by modern design methods, would necessarily slow down the operations. Without going into off-topic detail here: more complex logic to describe digit set transformations becomes greater logic propagation delay, which in turn requires deeper pipelines or lower clock rates.
So there is more than inertia behind the popularity of binary FP.
(Edited for grammar and markdown mess-up.)
(24) By Richard Damon (RichardDamon) on 2020-06-20 19:45:42 in reply to 21.1 [link] [source]
You do realize that the Decimal Floating-Point format stores the mantissa as a binary number, (not BCD encoded) and are generally based on the fact that 2^10 is just slightly bigger than 10^3, so 10 bits can store a number between 0 and 999, with a few unused coded, and 50 bits can store numbers between 0 and 999,999,999,999,999 with a few unused codes, so all the arithmetic being done in Decimal Floating-Point is done with the normal binary hardware. The only extra work is in the normalization, where we need multiplication by powers of 10 vs powers of 2 (a shift). (There also is a small decoder to convert 5 bits into the upper digit of the mantissa and the two upper bits of the exponent limited to 0, 1, 2, )
It is a very different format than the old Decimal Fixed Point notation that stored numbers as packed BCD digits.
(22) By Keith Medcalf (kmedcalf) on 2020-06-20 18:18:33 in reply to 20 [link] [source]
But 0.1 * 10.0 does equal 1.0 if your arithmetic unit is working properly.
And actually your argument about the "the slight additional costs to do Decimal Floating Point is small enough that if we were to start over, it could be argued that it would make more sense to use decimal floating-point rather than binary" is completely bonkers.
The fact of the matter is that in the "olden days" when "dinosaurs roamed the earth" and the computing power that you now have in your wrist-watch filled rooms the size of football stadiums, arithmetic and floating-point calculations were indeed performed in Decimal.
However, they no longer are done that way and the CISC instructions that used to exist in CPUs to do that sort of arithmetic have long since been retired. The reason? It was proven that there was no point, that it was terribly inefficient, that it had extremely high cost for no discernible advantage, and that it was inherently more error prone and had a wider error margin than using base-2 floating-point.
It is sort of like once upon a time there were people that believed the Earth was flat (apparently there still are) even though we learned better. Once upon a time people believed that the Sun and stars rotated around the Earth. Then they believed that the Earth and stars rotated around the Sun. But now we have a more complete understanding that everything is moving away from and rotating around a single point of origin. For the moment. Who knows what we will "discover" tomorrow.
What you are saying is that perhaps we should revert to the belief that the Earth is Flat simply because computing euclidean geometry is "easier" than computing spherical geometry, despite its known problems and inaccuracies when dealing with real-world problems.
This is to deny advances in knowledge simply because you do not like them and are prejudiced in your belief system. This is basically the same sort of behaviour as confronted Galileo.
(23) By Larry Brasfield (LarryBrasfield) on 2020-06-20 18:42:51 in reply to 22 [link] [source]
I'm not going to quibble with the above merely hyperbolic assertions. (They are not misleading because they are so obviously overstated.) However, there is one semi-untruth to be remedied:
... the CISC instructions that used to exist in CPUs to do that sort of arithmetic have long since been retired.
In fact, to support doing decimal arithmetic, which is still a software-intensive proposition compared to using a modern floating point arithmetic unit, nearly every microprocessor made since they were invented, up to today's 64-bit word crunchers, has had instructions called DAA and DAS (or similar). For example, see the Intel® 64 and IA-32 ArchitecturesSoftware Developer’s Manual at pages at 3-291 and 3-293. These instructions perform a post-addition or post-subtraction fix-up upon a pair of decimal digits held in an 8-bit register after similar pairs are added or subtracted. While primitive, this speeds up decimal arithmetic done in software by eliminating a lot of test/branch operations. There is even a processor ALU flag to help with these fix-ups.
To grant the "... retired" assertion its due: These instructions are available only in legacy and compatibility modes. I suppose that means they are retired. But they can be readily coaxed out of their retirement.
(25) By Richard Damon (RichardDamon) on 2020-06-20 19:59:46 in reply to 22 [link] [source]
But 0.1 * 10.0 does equal 1.0 if your arithmetic unit is working properly.
Since there exists no such exact number (in binary floating-point) as 0.1, it can't if your arithmetic unit is working properly.
0.1 in binary is the repeating number 0.0001100110011...
If you round that pattern to the nearest 64-bit binary floating-point number you will get
0.1000000000000000055511151231257827021181583404541015625
Which when you multiply by ten you will get
1.000000000000000055511151231257827021181583404541015625
Which may display as == 1.0000 but if you compare them it won't be equals.
(26) By Richard Damon (RichardDamon) on 2020-06-20 19:59:47 in reply to 22 [link] [source]
But 0.1 * 10.0 does equal 1.0 if your arithmetic unit is working properly.
Since there exists no such exact number (in binary floating-point) as 0.1, it can't if your arithmetic unit is working properly.
0.1 in binary is the repeating number 0.0001100110011...
If you round that pattern to the nearest 64-bit binary floating-point number you will get
0.1000000000000000055511151231257827021181583404541015625
Which when you multiply by ten you will get
1.000000000000000055511151231257827021181583404541015625
Which may display as == 1.0000 but if you compare them it won't be equals.
(27) By Keith Medcalf (kmedcalf) on 2020-06-20 22:55:15 in reply to 26 [link] [source]
SQLite version 3.33.0 2020-06-20 06:25:07
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select 0.1*10.0 == 1.0;
┌─────────────────┐
│ 0.1*10.0 == 1.0 │
├─────────────────┤
│ 1 │
└─────────────────┘
sqlite>
Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1 * 10.0 == 1.0
True
>>>
So your assertion that 0.1 * 10.0 != 1.0 is incorrect.
(28) By Richard Damon (RichardDamon) on 2020-06-21 00:36:29 in reply to 27 [link] [source]
I'll admit it, looks like I remembered the wrong rounding problem, the error in 0.1 is small enough that the error when you *10 still rounds down to 1.0
But try 0.1 + 0.2 == 0.3 That fails, and that is the sort of failure that pops up all over the place is one of the issues with binary floating-point.
If you think of floating-point number as appoximations to actual numbers you are ok, but we like to think of our data as what we actually entered. If we enter the name as Robert, we don't want Roberta, if we say we have 54321 widgets in inventory, we don't want the value 54320 back. When we enter a number with a decimal, we mean precisely that value (yes, we may know the value is a measurement with a tolerance, but the number is the number).
For many fields, the approximation works, but in others the errors are unacceptable. If I add up thousands of sales with decimals, I expect to get the EXACT result, but I need to carefully manage my computations to get it right to the penny. This is where using Decimal floating-point has a big advantage, now the computer is dealing with the exact same numbers the people are, not just some approximations.
(29) By Wout Mertens (wmertens) on 2020-06-21 08:26:10 in reply to 28 [link] [source]
0.1 + 0.2 == 0.3 That fails
Well, yes. The binary patterns on both sides are not exactly the same.
When comparing floats, you should test if they are within 1 Epsilon of each other, then they are equivalent.
Furthermore, you should always store all your data in full precision, and display it to the user with their expected precision.
So all in all, a lot of errors come from the fact that languages make it too easy to do the wrong thing. One should be able to express "almost equal" and "round to 2 significant digits" in a non-annoying way. Storing in full precision is easy of course.
There is a caveat with money though: Any intermediate result you show to users should be replaced with its rounded value. Otherwise you run the risk that calculations with irrational numbers (when you're dividing) round to a different last digit than when you repeat the calculation with the rounded values. This problem would also happen with decimal128.
(30) By Wout Mertens (wmertens) on 2020-06-21 08:46:37 in reply to 29 [link] [source]
Some more about that last point: I am pretty convinced that that sort of rounding problem is what people most associate with floating point precision even though they are not related.
These things crop up when converting from a weekly to a daily rate, for example. If some property is 800$ per week, the daily rate would be $114.29, but a week's worth of daily rate is $800.03.
(33) By Keith Medcalf (kmedcalf) on 2020-06-21 15:45:08 in reply to 30 [link] [source]
Quite bogus because that is generally not how things work in the real world.
In the real world the daily rate would be $200 and the weekly rate $800. The hourly rate would probably be $100 and the half-hour rate be $75. The monthly rate would probably be $2800 or thereabouts.
(36) By Wout Mertens (wmertens) on 2020-06-21 16:49:23 in reply to 33 [link] [source]
In the real world the daily rate would be $200 and the weekly rate $800.
Actually, no, not always, this is how one of my customers does it. But I was just illustrating my point, and judging from your other answers I think we're on the same page.
(31) By Richard Damon (RichardDamon) on 2020-06-21 11:19:49 in reply to 29 [link] [source]
And the issue here is that there is NO easy way to decide on exactly what value of epsilon you should use for a comparison because how accurate your result is will ve very much a product of what you have done for math in the middle and the relative size of the numbers.
The core of the issue is that binary floating-point needs to be thought of as approximations for real numbers (especially with fractions) even for numbers with low precision, while decimal floating-point can EXACTLY represent written numbers, as long as they are within the precision range. If something costs $1.19, that is an EXACT number, but in binary floating-point, I can't express it, just a 'nearest value', and need to keep doing rounding to keep the approximations from deteriorating. Decimal floatig=point gets around this problem by using the same base that we do, so it can exactly represent the same numbers we would write down, so the numbers we think of as exact, are exact (until we write too many digits).
Yes, with decimal floating-point, after doing a multiply or divide, you likely need to perform a rounding operation if you want to exactly mimic the paper and pencil result because that will be needed to mirror the decision of what precision that operation is to be performed to. What you can do is add/subtract an unlimited number of these, and as long as your sum doesn't exceed your precision limits, the result will be exact, because the input numbers will be, and the math will be. This is different than with a binary floating-point where we started with a representational error, and that error will accumulate with the addition.
(34) By Keith Medcalf (kmedcalf) on 2020-06-21 15:52:14 in reply to 31 [link] [source]
Actually the value of Epsilon (the machine precision) is a constant. It is stunningly simple to decide within "how many epsilons" two number are of each other, you simply subtract them from each other and compute how many "scaled epsilon" they are apart.
Easy peasy lemon squeezy.
Then you decide what your tolerance for error is in "scaled epsilon". If the number is less than this threshhold, then the two original number are equal. If the threshhold is exceeded, then they are not equal.
This is easy to do because, low and behold, IEEE floating point arithmetic makes certain guarantees as to the accuracy of the numbers is uses and the accuracy of its calculations.
(43) By Richard Damon (RichardDamon) on 2020-06-22 11:08:46 in reply to 34 [link] [source]
In theory, it may be easy to compute, for a given operation, but in practice, it gets hard, unless you do a correction after almost every operation, which frankly isn't practical.
For example, what scaled epsilon should you use for the result of:
SELECT SUM(price) as total_sales FROM Sales;
Let us say you have on the order of a million records and an average sales price on the order of $1000.
The round-off error, by my calculation, now possibly exceeds the penny, and you can't really assume independence of the rounding to give you likely canceling, as the penny values in prices are really 'random', but are very often things like .99, so we may well have consistent rounding biases.
The fundamental issue here is that the database didn't store the data we really wanted, prices are EXACT quantities, but we just stored an approximation. (Which is one of the major reasons DECIMAL formats got a start in business programming, they can keep exact numbers as exact).
(44) By Keith Medcalf (kmedcalf) on 2020-06-22 13:24:42 in reply to 43 [link] [source]
It is not hard at all. And the problem you posit does not exist. Consider the following where xi is in "pennies" and xf is in floating "dollars":
sqlite> create table x(xi, xf);
sqlite> insert into x (xi) select randomv(200000) from wholenumber where value between 1 and 1000000;
sqlite> update x set xf = xi/100.0;
sqlite> select * from x limit 10;
┌────────┬─────────┐
│ xi │ xf │
├────────┼─────────┤
│ 107368 │ 1073.68 │
│ 125617 │ 1256.17 │
│ 126619 │ 1266.19 │
│ 28429 │ 284.29 │
│ 175912 │ 1759.12 │
│ 185600 │ 1856.0 │
│ 67209 │ 672.09 │
│ 96405 │ 964.05 │
│ 94741 │ 947.41 │
│ 33114 │ 331.14 │
└────────┴─────────┘
sqlite> select sum(xi)/100.0, sum(xf) from x;
┌───────────────┬─────────────┐
│ sum(xi)/100.0 │ sum(xf) │
├───────────────┼─────────────┤
│ 999476637.6 │ 999476637.6 │
└───────────────┴─────────────┘
sqlite> select avg(cast(xi as real))/100.00, avg(xf) from x;
┌──────────────────────────────┬─────────────┐
│ avg(cast(xi as real))/100.00 │ avg(xf) │
├──────────────────────────────┼─────────────┤
│ 999.4766376 │ 999.4766376 │
└──────────────────────────────┴─────────────┘
sqlite>
The epsilon of double precision floating point has nothing to do with anything since the magnitude of the values and the intermediates is well bellow the precision limits of the representation.
You are correct, however, that if you carry out the computation in pure integer arithmetic you will get the wrong answer:
sqlite> select sum(xi)/count(*) from x;
┌──────────────────┐
│ sum(xi)/count(*) │
├──────────────────┤
│ 99947 │
└──────────────────┘
(45.1) By Keith Medcalf (kmedcalf) on 2020-06-22 14:53:41 edited from 45.0 in reply to 44 [link] [source]
Also you will note that:
sqlite> select epsilon(2000);
┌──────────────────────┐
│ epsilon(2000) │
├──────────────────────┤
│ 2.27373675443232e-13 │
└──────────────────────┘
sqlite> select epsilon(999476637.6);
┌──────────────────────┐
│ epsilon(999476637.6) │
├──────────────────────┤
│ 1.19209289550781e-07 │
└──────────────────────┘
sqlite> select epsilon(999.4766376);
┌──────────────────────┐
│ epsilon(999.4766376) │
├──────────────────────┤
│ 1.13686837721616e-13 │
└──────────────────────┘
That all the numbers have epsilons which indicate accuracy to 1/100000 of a penny.
So I am pretty sure that accuracy "to the penny" cannot be an issue.
(46) By Richard Damon (RichardDamon) on 2020-06-22 15:32:20 in reply to 45.1 [link] [source]
But epsilon is the error from a SINGLE operation if we add 10^6 numbers together, the possible error is nearly 10^6 as large, so that error is on the order of 1e-01
Unless you think that SQLite should have a SUM operation (and the like), with a defined rounding point to apply after every (or every so often) operation, this error will accumulate over the full sum, so you need to compute it for the full sum.
(47) By Keith Medcalf (kmedcalf) on 2020-06-22 17:29:49 in reply to 46 [link] [source]
So if you add 999476637.6 together 1 million times, you get 999476637600000.0 which has an epsilon of 0.125.
What is your point? As long as the epsilon of all intermediates/results are less that 1e-4 then all intermediates/results are valid "to the penny" (when rounded nearest, ties to even).
This simple mathematical truth cannot be changed no matter how much you might wish it were not so.
This applies to all binary floating point formats from Half-Precision to Octuple Precision.
qlite> select value-3, pow(10,value-3), epsilon(pow(10,value-3)) from wholenumber where value between 0 and 18;
┌─────────┬───────────────────┬──────────────────────────┐
│ value-3 │ pow(10,value-3) │ epsilon(pow(10,value-3)) │
├─────────┼───────────────────┼──────────────────────────┤
│ -2 │ 0.01 │ 1.73472347597681e-18 │
│ -1 │ 0.1 │ 1.38777878078145e-17 │
│ 0 │ 1.0 │ 2.22044604925031e-16 │
│ 1 │ 10.0 │ 1.77635683940025e-15 │
│ 2 │ 100.0 │ 1.4210854715202e-14 │
│ 3 │ 1000.0 │ 1.13686837721616e-13 │
│ 4 │ 10000.0 │ 1.81898940354586e-12 │
│ 5 │ 100000.0 │ 1.45519152283669e-11 │
│ 6 │ 1000000.0 │ 1.16415321826935e-10 │
│ 7 │ 10000000.0 │ 1.86264514923096e-09 │
│ 8 │ 100000000.0 │ 1.49011611938477e-08 │
│ 9 │ 1000000000.0 │ 1.19209289550781e-07 │
│ 10 │ 10000000000.0 │ 1.9073486328125e-06 │
│ 11 │ 100000000000.0 │ 1.52587890625e-05 │
│ 12 │ 1000000000000.0 │ 0.0001220703125 │
│ 13 │ 10000000000000.0 │ 0.001953125 │
│ 14 │ 100000000000000.0 │ 0.015625 │
│ 15 │ 1.0e+15 │ 0.125 │
└─────────┴───────────────────┴──────────────────────────┘
So as long as all values/intermediates/results are of a magnitude than 1e11 (for Binary64 Floating Point) you are assured that the rounded nearest half-to-even 2 decimal place result is perfectly accurate.
For "extended precision" it is bigger, at 1e15
sqlite> select value-3, pow(10,value-3), epsilon(pow(10,value-3),64) from wholenumber where value between 0 and 17;
┌─────────┬───────────────────┬─────────────────────────────┐
│ value-3 │ pow(10,value-3) │ epsilon(pow(10,value-3),64) │
├─────────┼───────────────────┼─────────────────────────────┤
│ -2 │ 0.01 │ 8.470329472543e-22 │
│ -1 │ 0.1 │ 6.7762635780344e-21 │
│ 0 │ 1.0 │ 1.0842021724855e-19 │
│ 1 │ 10.0 │ 8.67361737988404e-19 │
│ 2 │ 100.0 │ 6.93889390390723e-18 │
│ 3 │ 1000.0 │ 5.55111512312578e-17 │
│ 4 │ 10000.0 │ 8.88178419700125e-16 │
│ 5 │ 100000.0 │ 7.105427357601e-15 │
│ 6 │ 1000000.0 │ 5.6843418860808e-14 │
│ 7 │ 10000000.0 │ 9.09494701772928e-13 │
│ 8 │ 100000000.0 │ 7.27595761418343e-12 │
│ 9 │ 1000000000.0 │ 5.82076609134674e-11 │
│ 10 │ 10000000000.0 │ 9.31322574615479e-10 │
│ 11 │ 100000000000.0 │ 7.45058059692383e-09 │
│ 12 │ 1000000000000.0 │ 5.96046447753906e-08 │
│ 13 │ 10000000000000.0 │ 9.5367431640625e-07 │
│ 14 │ 100000000000000.0 │ 7.62939453125e-06 │
│ 15 │ 1.0e+15 │ 6.103515625e-05 │
│ 16 │ 1.0e+16 │ 0.0009765625 │
│ 17 │ 1.0e+17 │ 0.0078125 │
└─────────┴───────────────────┴─────────────────────────────┘
Similarly for Binary128 it is 1e29
sqlite> select value-3, pow(10,value-3), epsilon(pow(10,value-3),113) from wholenumber where value between 0 and 34;
┌─────────┬───────────────────┬──────────────────────────────┐
│ value-3 │ pow(10,value-3) │ epsilon(pow(10,value-3),113) │
├─────────┼───────────────────┼──────────────────────────────┤
│ -2 │ 0.01 │ 1.50463276905253e-36 │
│ -1 │ 0.1 │ 1.20370621524202e-35 │
│ 0 │ 1.0 │ 1.92592994438724e-34 │
│ 1 │ 10.0 │ 1.54074395550979e-33 │
│ 2 │ 100.0 │ 1.23259516440783e-32 │
│ 3 │ 1000.0 │ 9.86076131526265e-32 │
│ 4 │ 10000.0 │ 1.57772181044202e-30 │
│ 5 │ 100000.0 │ 1.26217744835362e-29 │
│ 6 │ 1000000.0 │ 1.0097419586829e-28 │
│ 7 │ 10000000.0 │ 1.61558713389263e-27 │
│ 8 │ 100000000.0 │ 1.29246970711411e-26 │
│ 9 │ 1000000000.0 │ 1.03397576569128e-25 │
│ 10 │ 10000000000.0 │ 1.65436122510606e-24 │
│ 11 │ 100000000000.0 │ 1.32348898008484e-23 │
│ 12 │ 1000000000000.0 │ 1.05879118406788e-22 │
│ 13 │ 10000000000000.0 │ 1.6940658945086e-21 │
│ 14 │ 100000000000000.0 │ 1.35525271560688e-20 │
│ 15 │ 1.0e+15 │ 1.0842021724855e-19 │
│ 16 │ 1.0e+16 │ 1.73472347597681e-18 │
│ 17 │ 1.0e+17 │ 1.38777878078145e-17 │
│ 18 │ 1.0e+18 │ 1.11022302462516e-16 │
│ 19 │ 1.0e+19 │ 1.77635683940025e-15 │
│ 20 │ 1.0e+20 │ 1.4210854715202e-14 │
│ 21 │ 1.0e+21 │ 1.13686837721616e-13 │
│ 22 │ 1.0e+22 │ 1.81898940354586e-12 │
│ 23 │ 1.0e+23 │ 1.45519152283669e-11 │
│ 24 │ 1.0e+24 │ 1.16415321826935e-10 │
│ 25 │ 1.0e+25 │ 1.86264514923096e-09 │
│ 26 │ 1.0e+26 │ 1.49011611938477e-08 │
│ 27 │ 1.0e+27 │ 1.19209289550781e-07 │
│ 28 │ 1.0e+28 │ 1.9073486328125e-06 │
│ 29 │ 1.0e+29 │ 1.52587890625e-05 │
│ 30 │ 1.0e+30 │ 0.0001220703125 │
│ 31 │ 1.0e+31 │ 0.0009765625 │
└─────────┴───────────────────┴──────────────────────────────┘
And for Binary256 it would be 1e67
sqlite> select value-3, pow(10,value-3), epsilon(pow(10,value-3),237) from wholenumber where value between 0 and 73;
┌─────────┬───────────────────┬──────────────────────────────┐
│ value-3 │ pow(10,value-3) │ epsilon(pow(10,value-3),237) │
├─────────┼───────────────────┼──────────────────────────────┤
│ -2 │ 0.01 │ 7.07474928033337e-74 │
│ -1 │ 0.1 │ 5.6597994242667e-73 │
│ 0 │ 1.0 │ 9.05567907882671e-72 │
│ 1 │ 10.0 │ 7.24454326306137e-71 │
│ 2 │ 100.0 │ 5.7956346104491e-70 │
│ 3 │ 1000.0 │ 4.63650768835928e-69 │
│ 4 │ 10000.0 │ 7.41841230137484e-68 │
│ 5 │ 100000.0 │ 5.93472984109987e-67 │
│ 6 │ 1000000.0 │ 4.7477838728799e-66 │
│ 7 │ 10000000.0 │ 7.59645419660784e-65 │
│ 8 │ 100000000.0 │ 6.07716335728627e-64 │
│ 9 │ 1000000000.0 │ 4.86173068582902e-63 │
│ 10 │ 10000000000.0 │ 7.77876909732643e-62 │
│ 11 │ 100000000000.0 │ 6.22301527786114e-61 │
│ 12 │ 1000000000000.0 │ 4.97841222228891e-60 │
│ 13 │ 10000000000000.0 │ 7.96545955566226e-59 │
│ 14 │ 100000000000000.0 │ 6.37236764452981e-58 │
│ 15 │ 1.0e+15 │ 5.09789411562385e-57 │
│ 16 │ 1.0e+16 │ 8.15663058499816e-56 │
│ 17 │ 1.0e+17 │ 6.52530446799852e-55 │
│ 18 │ 1.0e+18 │ 5.22024357439882e-54 │
│ 19 │ 1.0e+19 │ 8.35238971903811e-53 │
│ 20 │ 1.0e+20 │ 6.68191177523049e-52 │
│ 21 │ 1.0e+21 │ 5.34552942018439e-51 │
│ 22 │ 1.0e+22 │ 8.55284707229503e-50 │
│ 23 │ 1.0e+23 │ 6.84227765783602e-49 │
│ 24 │ 1.0e+24 │ 5.47382212626882e-48 │
│ 25 │ 1.0e+25 │ 8.75811540203011e-47 │
│ 26 │ 1.0e+26 │ 7.00649232162409e-46 │
│ 27 │ 1.0e+27 │ 5.60519385729927e-45 │
│ 28 │ 1.0e+28 │ 8.96831017167883e-44 │
│ 29 │ 1.0e+29 │ 7.17464813734306e-43 │
│ 30 │ 1.0e+30 │ 5.73971850987445e-42 │
│ 31 │ 1.0e+31 │ 4.59177480789956e-41 │
│ 32 │ 1.0e+32 │ 7.3468396926393e-40 │
│ 33 │ 1.0e+33 │ 5.87747175411144e-39 │
│ 34 │ 1.0e+34 │ 4.70197740328915e-38 │
│ 35 │ 1.0e+35 │ 7.52316384526264e-37 │
│ 36 │ 1.0e+36 │ 6.01853107621011e-36 │
│ 37 │ 1.0e+37 │ 4.81482486096809e-35 │
│ 38 │ 1.0e+38 │ 7.70371977754894e-34 │
│ 39 │ 1.0e+39 │ 6.16297582203915e-33 │
│ 40 │ 1.0e+40 │ 4.93038065763132e-32 │
│ 41 │ 1.0e+41 │ 7.88860905221012e-31 │
│ 42 │ 1.0e+42 │ 6.31088724176809e-30 │
│ 43 │ 1.0e+43 │ 5.04870979341448e-29 │
│ 44 │ 1.0e+44 │ 8.07793566946316e-28 │
│ 45 │ 1.0e+45 │ 6.46234853557053e-27 │
│ 46 │ 1.0e+46 │ 5.16987882845642e-26 │
│ 47 │ 1.0e+47 │ 8.27180612553028e-25 │
│ 48 │ 1.0e+48 │ 6.61744490042422e-24 │
│ 49 │ 1.0e+49 │ 5.29395592033938e-23 │
│ 50 │ 1.0e+50 │ 8.470329472543e-22 │
│ 51 │ 1.0e+51 │ 6.7762635780344e-21 │
│ 52 │ 1.0e+52 │ 5.42101086242752e-20 │
│ 53 │ 1.0e+53 │ 8.67361737988404e-19 │
│ 54 │ 1.0e+54 │ 6.93889390390723e-18 │
│ 55 │ 1.0e+55 │ 5.55111512312578e-17 │
│ 56 │ 1.0e+56 │ 8.88178419700125e-16 │
│ 57 │ 1.0e+57 │ 7.105427357601e-15 │
│ 58 │ 1.0e+58 │ 5.6843418860808e-14 │
│ 59 │ 1.0e+59 │ 4.54747350886464e-13 │
│ 60 │ 1.0e+60 │ 7.27595761418343e-12 │
│ 61 │ 1.0e+61 │ 5.82076609134674e-11 │
│ 62 │ 1.0e+62 │ 4.65661287307739e-10 │
│ 63 │ 1.0e+63 │ 7.45058059692383e-09 │
│ 64 │ 1.0e+64 │ 5.96046447753906e-08 │
│ 65 │ 1.0e+65 │ 4.76837158203125e-07 │
│ 66 │ 1.0e+66 │ 7.62939453125e-06 │
│ 67 │ 1.0e+67 │ 6.103515625e-05 │
│ 68 │ 1.0e+68 │ 0.00048828125 │
│ 69 │ 1.0e+69 │ 0.0078125 │
│ 70 │ 1.0e+70 │ 0.0625 │
└─────────┴───────────────────┴──────────────────────────────┘
(49) By Richard Damon (RichardDamon) on 2020-06-22 20:23:35 in reply to 47 [link] [source]
We are adding a sum of a million numbers of magnitude 1000, thus the sum will be magnitude 10^9, which has an epsilon of 1.2x10^-7, we are adding a million of these, so our error is 10^6 * 1.2*10^7 = 0.12 and thus we have no certainty of being right to the penny.
The issue is that even if individual epsilons are small, unless we have an opportunity in the calculation to perform a rounding to our power of ten, they WILL accumulate.
Add up a hundred 1e-4 epsilons, and you get a possible 1e-2 error. That is basic math,
Or, are you proposing that we somehow redefine our basic math primates to include periodic rounding to a defined number of decimal places?
Note, the math won't fail all the time, and maybe not even most of the time, which is part of the problem, errors like this take effort to root out and takes detailed study to locate, and have the habit of being noticed at the worst time.
As an example, let all 1 million records have a value of 999.95 and sum them. When I do this in python, I get a sum of 999950000.0175645, which when rounded to the penny is now off 2 cents, and unless SQLite3 does special summing rules to minimize accumulated summing errors, it should get about the same.
(51) By Tim Streater (Clothears) on 2020-06-22 20:35:29 in reply to 49 [link] [source]
"Add up a hundred 1e-4 epsilons, and you get a possible 1e-2 error. That is basic math"
I'm not sure this is right. These epsilons do not all have the same sign, so some will cancel out. I did study this stuff (about error propagation) but that was 50 years ago and I've forgotten it all now. Suggest you consult a mathematician.
(52) By Richard Damon (RichardDamon) on 2020-06-22 20:54:38 in reply to 51 [link] [source]
These epsilons do not all have the same sign,
But they MIGHT all have the same sign (or most of them). That is the difference between looking at an 'expected' error and a worse case error. IF you know the numbers coming in are uncorrelated, you can assume a lot of cancelations. So if you are starting with numbers that truly are approximations, and independent of each other the error is a lot smaller.
Money though, at its root is an exact number, and financial numbers often aren't that independent, and you are often worried about 100% accuracy, not just good to 'x' sigmas, so the analysis rules are different.
It is common to assume independence because it makes things a lot simpler, It is rarely 100% true, but often true enough to get away with it. If you really need the right answer, it gets a lot harder.
(53) By Larry Brasfield (LarryBrasfield) on 2020-06-22 20:54:53 in reply to 51 [link] [source]
It is true that the errors are unlikely to have the same sign. And they are unlikely to all be a whole epsilon. Hence, to assess probable error, some analysis is necessary, together with some assumptions about probability distribution, and likely deviation from the means.
But Mr. Damon qualified the "1e-2 error" as being "possible". None of that fancy statistical analysis is going to contradict that assertion; it can only show likelihood, not the extremes of possibility. (In fact, such analysis can obscure those extremes.)
(54) By Keith Medcalf (kmedcalf) on 2020-06-22 21:04:52 in reply to 49 [link] [source]
Well, that is because whatever platform/compiler is being used does not know how to do arithmetic correctly (ie, it is not generating IEEE compliant operations). Here are two results with exactly the same code, both run on the same machine on the same Operating System, one after each. (FYI the hardware is an Intel Xeon E3-1503M v6 with the latest microcode and the OS is Windows 10 2004 build 19041.329 x64).
You will note that the ONLY difference is the compiler that was used to generate the machine code. You will note that MSVC (the current version) arrived at the same wrong answer as you claim Python does (and also Python compiled with the MSVC compiler also has the same inability to do basic arithmetic). But the version compiled with MinGW GCC arrives at the correct answer.
Apparently you can and should be blaming defective tools for defective answers. Note that this is not an inherent problem with IEEE arithmetic, it is a problem with using defective tools and then laying blame at the wrong doorstep.
(Note, although MinGW does perform "basic arithmetic" that is IEEE compliant, many of the mathematical functions (transcendental functions) leave "quite a bit to be desired" in terms of precision, many of them not returning results within 1 ULP precision).
SQLite version 3.33.0 2020-06-21 17:13:20
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> pragma compile_options;
┌────────────────────────────────┐
│ compile_options │
├────────────────────────────────┤
│ ALLOW_COVERING_INDEX_SCAN │
│ ALLOW_URI_AUTHORITY │
│ COMPILER=msvc-1926 │
│ DEFAULT_CACHE_SIZE=-1048576 │
│ DEFAULT_FOREIGN_KEYS │
│ DEFAULT_PAGE_SIZE=4096 │
│ DEFAULT_RECURSIVE_TRIGGERS │
│ DEFAULT_WAL_AUTOCHECKPOINT=256 │
│ DEFAULT_WAL_SYNCHRONOUS=1 │
│ DEFAULT_WORKER_THREADS=8 │
│ ENABLE_8_3_NAMES=1 │
│ ENABLE_API_ARMOR │
│ ENABLE_BYTECODE_VTAB │
│ ENABLE_COLUMN_METADATA │
│ ENABLE_COLUMN_USED_MASK │
│ ENABLE_COSTMULT │
│ ENABLE_CURSOR_HINTS │
│ ENABLE_DBSTAT_VTAB │
│ ENABLE_FTS3 │
│ ENABLE_FTS3_PARENTHESIS │
│ ENABLE_FTS4 │
│ ENABLE_FTS5 │
│ ENABLE_JSON1 │
│ ENABLE_LOAD_EXTENSION │
│ ENABLE_LOCKING_STYLE=1 │
│ ENABLE_MEMORY_MANAGEMENT │
│ ENABLE_PREUPDATE_HOOK │
│ ENABLE_RBU │
│ ENABLE_RTREE │
│ ENABLE_SESSION │
│ ENABLE_SNAPSHOT │
│ ENABLE_STAT4 │
│ ENABLE_STMT_SCANSTATUS │
│ ENABLE_UNKNOWN_SQL_FUNCTION │
│ ENABLE_UPDATE_DELETE_LIMIT │
│ EXPLAIN_ESTIMATED_ROWS │
│ EXTRA_INIT=core_init │
│ LIKE_DOESNT_MATCH_BLOBS │
│ MAX_ATTACHED=15 │
│ MAX_WORKER_THREADS=16 │
│ SOUNDEX │
│ STAT4_SAMPLES=64 │
│ TEMP_STORE=1 │
│ THREADSAFE=1 │
│ USE_DATETIME_NEW │
│ USE_PRECISE_TIME │
│ USE_URI │
└────────────────────────────────┘
sqlite> select sum(999.95) from wholenumber where value between 1 and 1000000;
┌──────────────────┐
│ sum(999.95) │
├──────────────────┤
│ 999950000.017565 │
└──────────────────┘
sqlite> ^Z
-- Loading resources from C:\Users\KMedcalf/.sqliterc
SQLite version 3.33.0 2020-06-21 17:13:20
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> pragma compile_options;
┌────────────────────────────────┐
│ compile_options │
├────────────────────────────────┤
│ ALLOW_COVERING_INDEX_SCAN │
│ ALLOW_URI_AUTHORITY │
│ COMPILER=gcc-9.1.0 │
│ DEFAULT_CACHE_SIZE=-1048576 │
│ DEFAULT_FOREIGN_KEYS │
│ DEFAULT_PAGE_SIZE=4096 │
│ DEFAULT_RECURSIVE_TRIGGERS │
│ DEFAULT_WAL_AUTOCHECKPOINT=256 │
│ DEFAULT_WAL_SYNCHRONOUS=1 │
│ DEFAULT_WORKER_THREADS=8 │
│ ENABLE_8_3_NAMES=1 │
│ ENABLE_API_ARMOR │
│ ENABLE_BYTECODE_VTAB │
│ ENABLE_COLUMN_METADATA │
│ ENABLE_COLUMN_USED_MASK │
│ ENABLE_COSTMULT │
│ ENABLE_CURSOR_HINTS │
│ ENABLE_DBSTAT_VTAB │
│ ENABLE_FTS3 │
│ ENABLE_FTS3_PARENTHESIS │
│ ENABLE_FTS4 │
│ ENABLE_FTS5 │
│ ENABLE_JSON1 │
│ ENABLE_LOAD_EXTENSION │
│ ENABLE_LOCKING_STYLE=1 │
│ ENABLE_MEMORY_MANAGEMENT │
│ ENABLE_PREUPDATE_HOOK │
│ ENABLE_RBU │
│ ENABLE_RTREE │
│ ENABLE_SESSION │
│ ENABLE_SNAPSHOT │
│ ENABLE_STAT4 │
│ ENABLE_STMT_SCANSTATUS │
│ ENABLE_UNKNOWN_SQL_FUNCTION │
│ ENABLE_UPDATE_DELETE_LIMIT │
│ EXPLAIN_ESTIMATED_ROWS │
│ EXTRA_INIT=core_init │
│ HAVE_ISNAN │
│ LIKE_DOESNT_MATCH_BLOBS │
│ MAX_ATTACHED=15 │
│ MAX_WORKER_THREADS=16 │
│ SOUNDEX │
│ STAT4_SAMPLES=64 │
│ TEMP_STORE=1 │
│ THREADSAFE=1 │
│ USE_DATETIME_NEW │
│ USE_PRECISE_TIME │
│ USE_QUADMATH │
│ USE_URI │
└────────────────────────────────┘
sqlite> select sum(999.95) from wholenumber where value between 1 and 1000000;
┌─────────────┐
│ sum(999.95) │
├─────────────┤
│ 999950000.0 │
└─────────────┘
sqlite>
Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from math import *
>>> a = []
>>> while len(a) < 1000000: a.append(999.95)
...
>>> len(a)
1000000
>>> sum(a)
999950000.0175645
>>> epsilon(sum(a))
1.1920928955078125e-07
>>>
(55) By Keith Medcalf (kmedcalf) on 2020-06-22 21:25:35 in reply to 54 [link] [source]
Actually, on checking, I also modified func.c to use extended precision for the intermediate in the sum. When I compile with GCC (MinGW) with the extended precision turned off, I get the same result as MSVC.
SQLite version 3.33.0 2020-06-22 16:13:37
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> pragma compile_options;
┌────────────────────────────────┐
│ compile_options │
├────────────────────────────────┤
│ ALLOW_COVERING_INDEX_SCAN │
│ ALLOW_URI_AUTHORITY │
│ COMPILER=gcc-9.1.0 │
│ DEFAULT_CACHE_SIZE=-1048576 │
│ DEFAULT_FOREIGN_KEYS │
│ DEFAULT_PAGE_SIZE=4096 │
│ DEFAULT_RECURSIVE_TRIGGERS │
│ DEFAULT_WAL_AUTOCHECKPOINT=256 │
│ DEFAULT_WAL_SYNCHRONOUS=1 │
│ DEFAULT_WORKER_THREADS=8 │
│ ENABLE_8_3_NAMES=1 │
│ ENABLE_API_ARMOR │
│ ENABLE_BYTECODE_VTAB │
│ ENABLE_COLUMN_METADATA │
│ ENABLE_COLUMN_USED_MASK │
│ ENABLE_COSTMULT │
│ ENABLE_CURSOR_HINTS │
│ ENABLE_DBSTAT_VTAB │
│ ENABLE_FTS3 │
│ ENABLE_FTS3_PARENTHESIS │
│ ENABLE_FTS4 │
│ ENABLE_FTS5 │
│ ENABLE_JSON1 │
│ ENABLE_LOAD_EXTENSION │
│ ENABLE_LOCKING_STYLE=1 │
│ ENABLE_MEMORY_MANAGEMENT │
│ ENABLE_PREUPDATE_HOOK │
│ ENABLE_RBU │
│ ENABLE_RTREE │
│ ENABLE_SESSION │
│ ENABLE_SNAPSHOT │
│ ENABLE_STAT4 │
│ ENABLE_STMT_SCANSTATUS │
│ ENABLE_UNKNOWN_SQL_FUNCTION │
│ ENABLE_UPDATE_DELETE_LIMIT │
│ EXPLAIN_ESTIMATED_ROWS │
│ EXTRA_INIT=core_init │
│ HAVE_ISNAN │
│ LIKE_DOESNT_MATCH_BLOBS │
│ MAX_ATTACHED=15 │
│ MAX_WORKER_THREADS=16 │
│ SOUNDEX │
│ STAT4_SAMPLES=64 │
│ TEMP_STORE=1 │
│ THREADSAFE=1 │
│ USE_DATETIME_NEW │
│ USE_PRECISE_TIME │
│ USE_URI │
└────────────────────────────────┘
sqlite> select sum(999.95) from wholenumber where value between 1 and 1000000;
┌──────────────────┐
│ sum(999.95) │
├──────────────────┤
│ 999950000.017565 │
└──────────────────┘
sqlite>
(58) By Richard Damon (RichardDamon) on 2020-06-22 22:33:02 in reply to 55 [link] [source]
Well, that is because whatever platform/compiler is being used does not know how to do arithmetic correctly (ie, it is not generating IEEE compliant operations). Here are two results with exactly the same code, both run on the same machine on the same Operating System, one after each. (FYI the hardware is an Intel Xeon E3-1503M v6 with the latest microcode and the OS is Windows 10 2004 build 19041.329 x64). Actually, on checking, I also modified func.c to use extended precision for the intermediate in the sum. When I compile with GCC (MinGW) with the extended precision turned off, I get the same result as MSVC.
I think this is a perfect example of the complexity of floating-point accuracy. If you could be so sure that the error I was describing was impossible.
Fundamentally, the issue boils down to the fact that when decimal numbers are input into a system using binary floating-point, by necessity, we can't store most numbers exactly, and resort to using approximations. Once we start on the path of approximation and start to do a lot of math with these numbers, the errors can grow surprisingly fast (and this example used no magnitude cancelation to get us to big errors faster.
The fundamental advantage of the Decimal number types is that they are able to store our decimal inputs exactly. In many domains, like most scientific ones, that isn't important, but in others, like finance, it is critical. The basic workflow was established long ago with people working with paper and pencils writing out numbers as decimals, so that has become the basic RULE for doing it.
It isn't that Decimal arithmetic is more precise, the epsilons are very close to the same (by design), but that the input numbers can start off precise, and then the intermediate values can stay precise, and thus we can get an answer that is precise as long as we don't exceed our resolution.
I suppose that one big reason computers don't have as good of support for decimal math is that the financial people who do need it, don't really need machines that do it blindingly fast, just machines that can handle a lot of data, so the machines that are made for the other types that do need speed, but don't need this exact decimal accuracy, are good enough for them as they likely are memory bottlenecking even with software-based math routines.
I suppose the other interesting thing is that we absolutely wouldn't accept for the database to slightly change our text fields to something 'close enough', but we accept it for our floating-point number fields.
(35) By Keith Medcalf (kmedcalf) on 2020-06-21 16:02:18 in reply to 31 [link] [source]
Certainly there is. The machine epsilon for double precision is 2.220446049250313e-16 which is pretty damn small. You only run out of precision when your value exceeds 10 to the 11, which leaves you with pennies and two guard digits of precision. This is quite sufficient accuracy for all "exactly rounded" operations to 1/100th of a unit without exception, though if you are paranoid you might want to be able to maintain 3 guard digits in which case your magnitude limit is 10 to the 10.
(37) By anonymous on 2020-06-21 17:16:22 in reply to 35 [link] [source]
Actually the value of Epsilon (the machine precision) is a constant. ... The machine epsilon for double precision is 2.220446049250313e-16
I don't think it is constant (well, that's my feeling, actually).
What's the epsilon for:
1234567890.1 + 0.2 == 1234567890.3
(38.1) By Keith Medcalf (kmedcalf) on 2020-06-21 19:02:59 edited from 38.0 in reply to 37 [link] [source]
Well, those two computational results happen to be exactly equal. That is, the distance between them is 0 ulp (zero Units in the Last Place).
sqlite> select ulps(1234567890.3, 1234567890.1 + 0.2);
┌────────────────────────────────────────┐
│ ulps(1234567890.3, 1234567890.1 + 0.2) │
├────────────────────────────────────────┤
│ 0.0 │
└────────────────────────────────────────┘
The value of the "error instilled by a unit in the last place" is:
sqlite> select ulp(1234567890.3);
┌──────────────────────┐
│ ulp(1234567890.3) │
├──────────────────────┤
│ 2.38418579101563e-07 │
└──────────────────────┘
I calculate the ULP as,
/*
* Compute the ULP Error for floating point numbers of scale value
* with significand bits in the significand
*/
static __inline double ulp(double value, int significand)
{
int exponent;
double mantissa = frexp(value, &exponent);
return ldexp(1.0, exponent - significand);
}
The "epsilon" can be calculated as:
def epsilon(x):
x = abs(x)
e = x
while (x + e) != x:
e /= 2.0
return e * 2.0
which will return the "minimum increment" to the next representable number in terms of the number provided -- that is, the smallest number which when added to x does not still result in x -- this does not work for 0.
The "epsilon" of 1234567890.3 is:
>>> epsilon(1234567890.3)
1.3706456972739644e-07
Note that the ULP and the Epsilon are not the same. The ULP, in theory, returns the "distance" between the next smaller representable number and the next larger representable number. The epsilon return the minimum power-of-2 which when added to x gives a different result than x.
(41) By Larry Brasfield (LarryBrasfield) on 2020-06-21 23:16:45 in reply to 37 [link] [source]
In the C language, (which is remotely on-topic in this thread/forum), there are constants defined via #define constructs, FLT_EPSILON and DBL_EPSILON, in <float.h>, which represent FP value resolution between 1.0 (inclusive) and 2.0 (exclusive).
(42) By Keith Medcalf (kmedcalf) on 2020-06-22 00:18:34 in reply to 41 [link] [source]
Yes. The "standard" value of "epsilon" according to the ISO C standard is the difference between the value 1.0 and the next larger representable value.
#include <stdio.h>
#include <float.h>
void main(void)
{
printf("FLT_EPSILON = %.17g\n", FLT_EPSILON);
printf("DBL_EPSILON = %.17g\n", DBL_EPSILON);
}
FLT_EPSILON = 1.1920928955078125e-007
DBL_EPSILON = 2.2204460492503131e-016
Here are the epsilon values for the most common binary floating formats (Half Precision, Single Precision, Double Precision, Extended Precision, Quadruple Precision and Octuple Precision)
sqlite> select epsilon(1,11) as "Epsilon Float16", epsilon(1,24) as "Epsilon Float32", epsilon(1,53) as "Epsilon Float64", epsilon(1,64) as "Epsilon Float80", epsilon(1,113) as "Epsilon Float128", epsilon(1, 237) as "Epsilon Float256";
┌─────────────────┬──────────────────────┬──────────────────────┬─────────────────────┬──────────────────────┬──────────────────────┐
│ Epsilon Float16 │ Epsilon Float32 │ Epsilon Float64 │ Epsilon Float80 │ Epsilon Float128 │ Epsilon Float256 │
├─────────────────┼──────────────────────┼──────────────────────┼─────────────────────┼──────────────────────┼──────────────────────┤
│ 0.0009765625 │ 1.19209289550781e-07 │ 2.22044604925031e-16 │ 1.0842021724855e-19 │ 1.92592994438724e-34 │ 9.05567907882671e-72 │
└─────────────────┴──────────────────────┴──────────────────────┴─────────────────────┴──────────────────────┴──────────────────────┘
(48) By anonymous on 2020-06-22 20:13:11 in reply to 42 [link] [source]
I'm not sure whether I've overseen it, but you seem to be using some extension function for "SELECT epsilon(1,11)", etc.
Can they be downloaded somewhere?
(56) By Keith Medcalf (kmedcalf) on 2020-06-22 21:49:39 in reply to 48 [link] [source]
You can find a whole bunch of loadable extensions at:
http://www.dessus.com/files/sqlite3extensions.zip
The particular math and floating pointing functions are in:
http://www.dessus.com/files/sqlmath.c
http://www.dessus.com/files/sqlfcmp.c
(the epsilon/ulp/ulps/rounding functions are in the latter sqlfcmp.c)
The zip file is so big because it also contains the (I believe current) GeoJSON for creation of the lat/long to timezone database plus the wee Python programs that will create the database for use by the "modified" datetime functions.
(57) By Warren Young (wyoung) on 2020-06-22 21:58:48 in reply to 56 [link] [source]
So let's summarize. All we need to do to use binary floating point arithmetic in place of DECIMAL
is:
Fix all of the "broken" compilers and wait for them to be sufficiently common that we can assume their broad use.
Rebuild all of the SQLite-using tools built by those compilers. (e.g. any Windows build of Python built by VC++.)
Either get this
epsilon()
into SQLite proper, or make everyone needing this add the extension to their SQLite.Teach all of the world's programmers currently using
DECIMAL
how to useepsilon()
properly and safely, which as we see from the length of this thread is not trivial.
Or... Or! We could just have some form of DECIMAL
in SQLite.
(60) By Richard Damon (RichardDamon) on 2020-06-22 23:54:35 in reply to 57 [link] [source]
Fix all of the "broken" compilers and wait for them to be sufficiently common that we can assume their broad use.
Actually, if you look elsewhere on the thread, he later posted that the version that got the 'right' answer actually was 'cheating' by using Float80s to do the math, so those extra bits of precision did the trick.
The epsilon tool doesn't actually do anything to directly solve the problem, it just lets you easily query what your computational uncertainty might be, so you can work out (maybe) if you have a problem.
(63) By anonymous on 2020-06-23 00:44:48 in reply to 60 [link] [source]
Yes, he did post that he had forgotten that he had patched a C file that he was using. I imagine that must have been a difficult post to write.
We all learn something every day, and I'm looking forward to his further input on this thread.
(66) By Keith Medcalf (kmedcalf) on 2020-06-23 17:51:04 in reply to 60 [source]
Actually, the one that gets the exact answer is using quadruple precision intermediates (Float128).
The extended precision "long double" still arrives at an answer that is inexact when rounded to a double, ie: 999950000.000008
I had just made sure that all "long double" in the codebase is using the declaration LONGDOUBLE_TYPE and that if the compiler is GCC then
#define LONGDOUBLE_TYPE __float128
rather than
#define LONGDOUBLE_TYPE long double
in config.h
The LONGDOUBLE_TYPE is defined as "long double" in sqliteInt.h if it is not already defined. Since the config.h is processed before sqliteInt.h, the definition in config.h takes precedence.
(65) By Wout Mertens (wmertens) on 2020-06-23 05:49:21 in reply to 57 [link] [source]
Or... Or! We could just have some form of DECIMAL in SQLite.
You forgot the other option.
The problem is that money values look a lot like floats but they aren't.
They are countable values with a required precision (4 digits, to avoid divide-by-7-and-add-7-times errors) and display precision (2 digits), PLUS the digit significance changes over the lifetime of the coin, even in the space of years.
So instead of creating an inefficient new type, you can also implement the calculations on this type properly: FIXED_SUM(column, precision = 4)
etc. After every step you round to the precision.
Less work, same result but more efficiently.
(32) By doug (doug9forester) on 2020-06-21 14:38:28 in reply to 29 [link] [source]
There is another problem with rounding when dealing with currency. When you create a journal entry in a double-entry bookkeeping system, calculating the individual entries by rounding often leads you a penny or two off for the total. So you have to add that discrepancy to an arbitrary entry to get the journal to balance. And then, if someone watching is really crazy, you've got to give the discrepancy to a different account the next month, etc.
Just saying...
(39) By Keith Medcalf (kmedcalf) on 2020-06-21 19:15:55 in reply to 32 [link] [source]
You are confusing the commutative property of infinite precision arithmetic with the fact that money is not commutative because it is not infinite precision (exact). This has nothing to do with whether you are using base-2 or base-10 or base-8 or base-16 or base-16384 or base-9007199254740992 floating point. You still have precisely the same problem.
You are assuming that (a + b) * something == (a * something) + (b * something)
when dealing specifically with money, and this is not the case.
(10) By Keith Medcalf (kmedcalf) on 2020-06-20 02:47:42 in reply to 7 [link] [source]
Base 2 floating point is entirely adequate for financial applications. You simply have to keep the numbers within the precision bounds of the format you supply.
Binary32 is good for magnitudes up to 1e2.
Binary64 is good for magnitudes up to 1e10.
Binary128 is good for magnitudes up to 1e29.
Similarly for the Decimal format. The only difference is that the decimal format is good for one more digit at the 8 byte size.
Decimal32 is good for magnitudes up to 1e2.
Decimal64 is good for magnitudes up to 1e10 or 1e11.
Decimal128 is good for magnitudes up to 1e29.
Of course, you have to address "proper exact rounding" but you have to do that with anything you use in the modern world since correct "exactly rounded" scaled decimal arithmetic is only (and was ever only) supported in COBOL (and PL/1), In all other languages since then you have to program your "exactly rounded" computations very carefully since there are no (or very few) runtime libraries that will do it for you.
Never trust anyone who says they need "decimal arithmetic" for money because it is self evident that they have no knowledge of what they speak, they are only spouting rumours propagated by the ill-informed.
(8) By Richard Hipp (drh) on 2020-06-20 00:33:56 in reply to 1 [link] [source]
Proposed: A new extension to implement infinite-precision decimal arithmetic on strings. The extension would consist of the following SQL functions:
- decimal_add()
- decimal_sub()
- decimal_mul()
- decimal_div()
And a single new collating function:
- DECIMAL
The functions would take two or three arguments. The first two arguments are strings that are interpreted as decimal numbers. The third argument is a non-negative integer which is the number of digits of precision after the decimal-point to preserve. If the third argument is omitted, the value used is the larger of the two input numbers.
The DECIMAL collating function compares two decimal number strings in numeric order. It works similar to UINT collating function in that it compares strings in numeric order. The difference is that DECIMAL does not access text intermixed with the strings, but it does work with negative numbers and decimal points.
Examples:
- decimal_add('123.4','3450.67',4) → '3574.0700'
- decimal_div('123.4','3450.67',30) → '0.035761171018961535050806332947'
- '123.4' < '3450.67' COLLATE DECIMAL → TRUE
(11) By luuk on 2020-06-20 07:07:47 in reply to 8 [link] [source]
"The third argument is a non-negative integer which is the number of digits of precision after the decimal-point to preserve. If the third argument is omitted, the value used is the larger of the two input numbers."
On decimal-mul()
this should be the addition of (number of decimal places of) the two input numbers.
- 0.3 * 0.66 = something with (1+2=) 3 decimals.
- 0.99 * 0.99 = something with (2+2=) 4 decimals.
(12) By doug (doug9forester) on 2020-06-20 07:42:13 in reply to 8 [link] [source]
Rounding issues:
decimal_add('1.33','1.12',1) = '2.45'-> one of '2.4' or '2.5'
decimal_div('4.55','1.0',1) = '4.55' -> one of '4.5' or '4.6'
What are the rounding rules? (are we dredging up the FP discussion or what? :))
(40) By Keith Medcalf (kmedcalf) on 2020-06-21 19:30:20 in reply to 12 [link] [source]
One would hope that the result is "computed exactly" then "rounded nearest with ties(half) to even" since this has the best error propagation characteristics.
"computed exactly" means that the calculation should be carried out with at least two guard digits in each argument, and thus at least four guard digits in the exact result, before rounding is applied.
(13) By Ryan Smith (cuz) on 2020-06-20 12:54:29 in reply to 8 [link] [source]
Yes please. +1
May I suggest, if not too exotic, to accept both strings and normal floats as parameters?
I would typically use:
SELECT decimal_div(LargeNumberColumn, NormalNumericColumn, d)
FROM t
where that first column contains BigNum decimal strings and the second references some column with normal values, like TaxRate or such.
And possibly then: '123.4' < 345.67 COLLATE DECIMAL → TRUE
Though a cast can easily solve these (at the expense of slightly ugly code).
Thank you!
Ryan
(67) By Keith Medcalf (kmedcalf) on 2020-06-23 17:56:34 in reply to 13 [link] [source]
Well, since they are extension functions you can use whatever your little heart desires as long it is not NULL and when retrieved with sqlite3_value_text() it returns a byte string that looks like a number expressed in UTF-8 ASCII code, so you could use BLOB, TEXT, REAL, or INTEGER as long as it survives the conversion to UTF-8 text and "looks like a number".
(16) By naklers on 2020-06-20 13:29:56 in reply to 8 [link] [source]
+1 :)
(19) By Keith Medcalf (kmedcalf) on 2020-06-20 16:56:29 in reply to 8 [link] [source]
If collation could be applied to blobs then there are already arbitrary precision extensions for SQLite3.
(50) By tom (younique) on 2020-06-22 20:27:11 in reply to 8 [link] [source]
Why can't it be implemented directly?
What I mean is a new data type DECIMALFLOAT and an internal binary representation of the data.
It's unnecessarily inconvenient having to call a function when a simple +, -, *, / would suffice.
So, instead of "SELECT decimal_add('1.2','3.4')", it would be much more useful having "SELECT CAST(1.2 AS DECIMALFLOAT)+CAST(3.4 AS DECIMALFLOAT)". (Just added the CASTs to make it clear in this example.)
(59) By Richard Damon (RichardDamon) on 2020-06-22 22:58:35 in reply to 50 [link] [source]
I suppose the biggest issue is that there isn't a 'Type Code' that could easily be used to indicate that the column is storing a Decimal value., so there would be a major binary format incompatibility to create the type.
One way to try to sneak in such a type would be to take one of the two reserved for internal use code, but SQLite3 might need those,
Or I suppose we could use a negative number and reduce the size of the maximum string/blob by a byte (they they allow values that big), These would say that it would take 17 bytes in the database to store a Decimal64 number.
Another option I suppose is to make it a second class citizen, which is stored in the database as a blob or string, but such a field could be cast to DECIMALFLOAT, and then math is done as decimal float numbers, and then either sent to the calling program as such a type or converted to some other type.
(61) By Larry Brasfield (LarryBrasfield) on 2020-06-22 23:59:33 in reply to 59 [link] [source]
Repeating something I've stated before in this forum: If distinction is made between precise monetary quantities and the results of calculations which inherently produce minute [a] errors, then integer cents can be stored and computation done with integers where absolute precision can be preserved and binary floating point can be used where absolute precision cannot be expected. No additional types are needed; adequate discipline alone suffices.
[a. No offense intended here but: The epsilon discussions make me wonder if anybody ever did argue about how many angels could dance upon the head of a pin. After seeing this thread, I doubt it less than before. ]
With a few extension functions to more obviously express operations and conversions among the exact cents and pretty darn close cents (or whole monetary unit) forms, it is all quite manageable.
Introducing a new type, where number components equal to negative powers of ten can be exactly represented, is not going to eliminate the need for such discipline. It may disguise the need for it a little better.
(62) By Richard Damon (RichardDamon) on 2020-06-23 00:29:48 in reply to 61 [link] [source]
Yes, you can handle a lot of the monetary issues with scaled integers. But things still get complicated. You need to decide ahead of time what your exact scale factor will be, and woe to you if you get it wrong. For example, what resolution do you store unit prices in? I will tell you as a fact that I have bought parts where the unit price for some of the parts had significant digits in the 1000ths of a cent. (Yes, you had to order at least a 1000 of them to get that price, but you didn't need to buy them in multiples of 1000, they did apply rounding to the penny to the net price of the part after multiplying by quantity).
Also, since you end up with numbers for intermediary results needing to be rounded off to different number of fractional digits, it does become a lot of work to track this down. And, since the numbers types themselves don't track this scale information, there is lots of need for checking and double-checking and then some law changes and you need to add a digit of precision and everything needs to be checked again, and all uses traced down, and if that value was saved in a database, that means checking EVERY program in the system to see if it uses it.
This is why COBOL and similar languages had fixed-point types with declared precision, so IT could handle all that mess.
Decimal types that can exactly represent the values you want to exactly represent don't get rid of all the problems, but they do clean up a lot of them and leave to you the real problems to solve.
(64) By Rowan Worth (sqweek) on 2020-06-23 03:13:33 in reply to 61 [link] [source]
The epsilon discussions make me wonder if anybody ever did argue about how many angels could dance upon the head of a pin.
Anybody in stilletos can answer that old thing:
It's one for the right foot one for the left
Half an angel per pin at best
Add wings add heart add harp all set
("Velodrome" by Dessa)