Log10(100) = 1.9999999999999998 ?!?
(1) By drjdpowell on 2022-05-25 12:26:44 [link] [source]
Hello, I've recently started using a newer version that include teh extra math functions, and I find that the Log10 function is only approximate. For example, Log10(100) = 1.9999999999999998
I previously used an extension called "extension-functions.c" for this. That returned 2, as expected.
Is this a bug, or expected?
Thanks,
-- James Powell
(2) By Richard Hipp (drh) on 2022-05-25 13:12:50 in reply to 1 [link] [source]
Yes, log10() (like must other math functions) gives an approximate answer.
Nevertheless, there is now a change on trunk that makes the answer ever so slightly more accurate, and with this change, the case of log10(100) ends up being exact - at least on my Ubuntu desktop. Your mileage may vary depending on what math library you link against.
(3) By drjdpowell on 2022-05-25 13:32:34 in reply to 2 [link] [source]
I looked at your change [c48a735b] and checked the math outside of SQLite:
Old way Log10(100) = Ln(100)*(1/Ln(10)) = 1.99999999999999978
New way Log10(100) = Ln(100)/Ln(10) = 2
So this was the issue.
-- James
(4) By Larry Brasfield (larrybr) on 2022-05-25 15:14:16 in reply to 3 [link] [source]
So this was the issue.
Only if "the issue" is exactitness of the result for log10(100).
Consider this session screen scrape using a CLI built from the "ever so slightly more accurate" checkin of this morning:
sqlite> select value, value - log10(cast(pow(10,value) as int)) from generate_series(1,18);
1|0.0
2|0.0
3|4.44089209850063e-16
4|0.0
5|0.0
6|8.88178419700125e-16
7|0.0
8|0.0
9|1.77635683940025e-15
10|0.0
11|0.0
12|1.77635683940025e-15
13|1.77635683940025e-15
14|0.0
15|1.77635683940025e-15
16|0.0
17|0.0
18|3.5527136788005e-15
sqlite3>
The maximum absolute value of the above errors is no smaller than what is obtained from the v3.38.0 release. The change has caused more of those results to be 0.0 which might be considered an improvement in an application which happens to need exact results for just the right subset of inputs.
However, errors such as the above and what you saw earlier should be expected when using floating point arithmetic and particularly when the transcendental functions are used. Those functions can only compute approximations for the vast majority of input values and the hardware does not special-case the inputs for which mathematically exact answers are possible.
The "issue" was and is unrealistic expectation rather than a possible incremental accuracy improvement. The change you celebrate as resolving "the issue" will have no effect on FP hardware which does not maintain a larger mantissa for intermediate results than for stored results. The real issue is better resolved by learning the limitations of floating point computer arithmetic.
(5) By drjdpowell on 2022-05-25 16:58:44 in reply to 4 [link] [source]
Well, it was the issue of the difference from the results with Log10 in the extension-function.c extension I have been using. I looked in that code and it calculates Log10 in the same way exactly.
Clearly there is a more accurate algorithm, as the Log10 function in the language I use doesn't exhibit these small inconsistencies, even though it gives the same results you show for the formula Ln(x)/Ln(10)
(6) By Larry Brasfield (larrybr) on 2022-05-25 18:36:26 in reply to 5 [link] [source]
The code behind log10() could be made to always yield integer results for positive integer powers of 10 (within bounds set by mantissa bit count)1, but it would occupy more memory. I would say that such precision for such special cases is not expected if not for the present counter-example. People familiar with the limitations of floating point arithmetic hardware take near-ULP errors in stride and avoid writing code that is thrown off by them.
I do not hope to convince you as to what "the issue" is here. My intention is to help keep unreasonable expectations from being confirmed. We see a never-ending succession of posts here regarding minute floating point errors that are perfectly expectable once the hardware limitations are understood. Nobody should expect such errors to be "fixed" by algorithmic somersaults.
- ^ The algorithm effects normalization by dividing by positive integer powers of 10 followed by detection of a result equal to 1.
(8) By drjdpowell on 2022-05-26 08:14:31 in reply to 6 [source]
I fully agree. I was just concerned that there was a change to a worse algorithm going from the extension-functions.c implementation to SQLite's (which there was).
With the change, I am happy.
(9) By KIT.james (kjames3411) on 2022-05-26 12:18:54 in reply to 8 [link] [source]
which there was
No. (according to this thread's answers)
The algorithm could become better but with less precision. And it would still be better. Or it become worse but with better precision for your cases. And it would still be worse. (what is better or worse, anyway?)
Math precision in computer science is never exact so you should not count on it.
This is not about bad practice but about understanding that if you make programs while asserting that "exact precision has a meaning" then you make buggy programs. You might even make programs that format your hard drive if you have bad luck (this is not a joke).
(10.5) By Keith Medcalf (kmedcalf) on 2022-05-27 05:05:41 edited from 10.4 in reply to 9 [link] [source]
Just for a lark I calculated the error distribution over a sample, using different calculation methods. Interestingly, the results are remakably consistent.
This uses some extensions I created:
ulps(x,y) computes the distance between x and y in units of the epsilon of x
sigdigits(x,d) returns x exactly rounded (nearest, half to even) to d significant digits
avg(x) computes the average by successive approximation to the mean
skewp(x) computes the population skew of the samples
kurtp(x) computes the population kurtosis of the samples
Using the following script:
.timer off
.stats off
.version
.mode list
.header off
select compile_options from pragma_compile_options where compile_options like 'COMPILER%';
.print Units of Epsilon Calculation Error
.mode box --wrap off
.timer on
with ln10(x) as materialized
(
select ln(10.0)
),
constants(M, N, Ten, R, S, StartAt, StopAt) as materialized
(
select 1.0 / x,
x,
10.0,
1,
4,
1,
1e8
from ln10
),
dataset(value, epsilon, epsilon2, epsilon3) as materialized
(
select value,
abs(ulps(value, pow(Ten, log10(value)))) as epsilon,
abs(ulps(value, pow(Ten, ln(value)*M))) as epsilon2,
abs(ulps(value, pow(Ten, ln(value)/N))) as epsilon3
from constants
join generate_series
where start = StartAt
and stop = StopAt
)
select 'pow(10.0, log10(value))' as Method,
StopAt as Samples,
min(epsilon) as Min,
round(avg(epsilon) ,R) as Avg,
max(epsilon) as Max,
sigdigits(skewp(epsilon), S) as SkewP,
sigdigits(kurtp(epsilon), S) as KurtP
from constants, dataset
union all
select 'pow(10.0, ln(value)*(1/ln(10.0)))' as Method,
StopAt as Samples,
min(epsilon2) as Min,
round(avg(epsilon2), R) as Avg,
max(epsilon2) as Max,
sigdigits(skewp(epsilon2), S) as SkewP,
sigdigits(kurtp(epsilon2), S) as KurtP
from constants, dataset
union all
select 'pow(10.0, ln(value)/ln(10.0))' as Method,
StopAt as Samples,
min(epsilon3) as Min,
round(avg(epsilon3), R) as Avg,
max(epsilon3) as Max,
sigdigits(skewp(epsilon3), S) as SkewP,
sigdigits(kurtp(epsilon3), S) as KurtP
from constants, dataset
order by Avg
;
Yields (computer is Windows 10 Pro current version, Xeon processor):
(sqlite3 is compiled with MSVC x64)
(sqlite32 is compiled with MinGW x86)
(sqlite64 is compiled with MinGW x64)
sqlite3 code base is c48a735bd4 using the log10 directly.
>sqlite3 < log10.sql
SQLite 3.39.0 2022-05-25 19:38:32 7390728f43e6498905e569ca0b1c779f33b869165b7e9080e1ca09b4e05d20f3
zlib version 1.2.12
msvc-1929
COMPILER=msvc-1929
Units of Epsilon Calculation Error
┌───────────────────────────────────┬─────────────┬─────┬──────┬──────┬────────┬──────────┐
│ Method │ Samples │ Min │ Avg │ Max │ SkewP │ KurtP │
├───────────────────────────────────┼─────────────┼─────┼──────┼──────┼────────┼──────────┤
│ pow(10.0, log10(value)) │ 100000000.0 │ 0.0 │ 3.3 │ 9.0 │ 0.3206 │ -0.6079 │
│ pow(10.0, ln(value)/ln(10.0)) │ 100000000.0 │ 0.0 │ 10.9 │ 40.0 │ 0.5895 │ -0.1608 │
│ pow(10.0, ln(value)*(1/ln(10.0))) │ 100000000.0 │ 0.0 │ 17.0 │ 50.0 │ 0.4421 │ -0.07938 │
└───────────────────────────────────┴─────────────┴─────┴──────┴──────┴────────┴──────────┘
Run Time: real 149.048 user 134.156250 sys 14.750000
>sqlite32 < log10.sql
SQLite 3.39.0 2022-05-25 19:38:32 7390728f43e6498905e569ca0b1c779f33b869165b7e9080e1ca09b4e05d20f3
zlib version 1.2.12
gcc-10.2.0
COMPILER=gcc-10.2.0
Units of Epsilon Calculation Error
┌───────────────────────────────────┬─────────────┬─────┬──────┬──────┬────────┬──────────┐
│ Method │ Samples │ Min │ Avg │ Max │ SkewP │ KurtP │
├───────────────────────────────────┼─────────────┼─────┼──────┼──────┼────────┼──────────┤
│ pow(10.0, log10(value)) │ 100000000.0 │ 0.0 │ 3.3 │ 9.0 │ 0.3208 │ -0.6078 │
│ pow(10.0, ln(value)/ln(10.0)) │ 100000000.0 │ 0.0 │ 10.9 │ 40.0 │ 0.5895 │ -0.1608 │
│ pow(10.0, ln(value)*(1/ln(10.0))) │ 100000000.0 │ 0.0 │ 17.0 │ 50.0 │ 0.4421 │ -0.07931 │
└───────────────────────────────────┴─────────────┴─────┴──────┴──────┴────────┴──────────┘
Run Time: real 186.107 user 169.062500 sys 16.921875
>sqlite64 < log10.sql
SQLite 3.39.0 2022-05-25 19:38:32 7390728f43e6498905e569ca0b1c779f33b869165b7e9080e1ca09b4e05d20f3
zlib version 1.2.12
gcc-10.2.0
COMPILER=gcc-10.2.0
Units of Epsilon Calculation Error
┌───────────────────────────────────┬─────────────┬─────┬──────┬──────┬────────┬──────────┐
│ Method │ Samples │ Min │ Avg │ Max │ SkewP │ KurtP │
├───────────────────────────────────┼─────────────┼─────┼──────┼──────┼────────┼──────────┤
│ pow(10.0, log10(value)) │ 100000000.0 │ 0.0 │ 3.3 │ 9.0 │ 0.3208 │ -0.6078 │
│ pow(10.0, ln(value)/ln(10.0)) │ 100000000.0 │ 0.0 │ 10.9 │ 40.0 │ 0.5894 │ -0.1609 │
│ pow(10.0, ln(value)*(1/ln(10.0))) │ 100000000.0 │ 0.0 │ 17.0 │ 50.0 │ 0.4421 │ -0.07934 │
└───────────────────────────────────┴─────────────┴─────┴──────┴──────┴────────┴──────────┘
Run Time: real 490.883 user 476.640625 sys 13.937500
Clearly the use of the runtime log10 function is more accurate than calculating log10 using napierian log functions.
Note The 64-bit MSVC is using entirely double arithmetic (no floating point longer than double is supported). The 32-bit GCC is is using long double (wherever specified) and extended precision intermediates everywhere. The 64-bit GCC is using 128-bit floats (whereever specified) and extended precision intermediates everywhere. This is why the MSVC version is so fast and the 64-bit GCC version so slow.
(11) By Keith Medcalf (kmedcalf) on 2022-05-27 05:09:34 in reply to 10.5 [link] [source]
Using a different method of sample generation that poduces a more uniform sample set, the following results are obtained, showing the huge difference in the error size for the different calculation methods:
SQLite3 64-bit MinGW GCC 10.2.0
SQLite 3.39.0 2022-05-27 03:43:57 2ddd141d4583206cd759fd164f42a8739da823d783e5b699e148663a9211a7e6
zlib version 1.2.12
gcc-10.2.0
┌────────────────────────────────────────────────┬───────┬─────────────┬────────────┬────────┬─────────────┬───────────┬────────┐
│ Method │ Count │ Min │ Avg │ Max │ StDevP │ SkewP │ KurtP │
├────────────────────────────────────────────────┼───────┼─────────────┼────────────┼────────┼─────────────┼───────────┼────────┤
│ Sample Statistics │ 1e+07 │ 9.9019e-308 │ 1.363e+305 │ 1e+308 │ 2.6188e+306 │ │ │
│ ulps(value, pow(10.0, log10(value))) │ │ -589.0 │ -0.4 │ 589.0 │ 138.71 │ 0.0071801 │ 1.9461 │
│ ulps(value, pow(10.0, ln(value)/ln(10.0))) │ │ -1574.0 │ 204.4 │ 1658.0 │ 255.84 │ 0.82545 │ 2.3597 │
│ ulps(value, pow(10.0, ln(value)*(1/ln(10.0)))) │ │ -1950.0 │ 332.0 │ 2035.0 │ 332.97 │ 0.46383 │ 1.949 │
└────────────────────────────────────────────────┴───────┴─────────────┴────────────┴────────┴─────────────┴───────────┴────────┘
Run Time: real 69.139 user 68.890625 sys 0.218750
(12) By KIT.james (kjames3411) on 2022-05-27 12:13:30 in reply to 10.5 [link] [source]
Sorry I cannot examine all of your data now, but the point of my post was not to say that the algorithm did not become better, but that precision has nothing to do with it.
Of course it's obvious you can expect algorithms to become better and better.
(7.1) By Keith Medcalf (kmedcalf) on 2022-05-25 19:11:57 edited from 7.0 in reply to 4 [link] [source]
The error is 1 ULP.
I simply modified func.c so that it would use the standard library log10 function directly if you define HAVE_LOG10 during compile.
Index: src/func.c
==================================================================
--- src/func.c
+++ src/func.c
@@ -2092,21 +2092,34 @@
default:
return;
}
ans = log(x)/b;
}else{
+#if !defined(HAVE_LOG10)
ans = log(x);
+#endif
switch( SQLITE_PTR_TO_INT(sqlite3_user_data(context)) ){
case 1:
/* Convert from natural logarithm to log base 10 */
+#if !defined(HAVE_LOG10)
ans /= M_LN10;
+#else
+ ans = log10(x);
+#endif
break;
case 2:
/* Convert from natural logarithm to log base 2 */
+#if !defined(HAVE_LOG10)
ans /= M_LN2;
+#else
+ ans = log(x) / M_LN2;
+#endif
break;
default:
+#if defined(HAVE_LOG10)
+ ans = log(x);
+#endif
break;
}
}
sqlite3_result_double(context, ans);
}
This corrected the problem with MSVC/Windows and its inability to do arithmetic, a problem which GCC did not have on Windows. 64-bit Windows dally-farts with the floating point control registers and you have to un-dally-fart them in order to accomplish IEEE compliance (or use a compiler that automatically does the un-dally-farting for you, such as just about everything non-microsoft) -- or stick to 32-bit Windows, which can do IEEE compliant arithmetic.
However, using the log10 function directly with GCC (MinGW 10.2) yields correct answers, as does doing the conversion as is done now.
Neither the OS nor the compiler used were mentioned in the original post.