Sum precision ?

# Sum precision ?

```Hello !
Doing some tests I found that summing floating points with sqlite gives slightly different results compared to C, I understand that floating point arithmetic with big and small values is problematic and I'm leaving it here just in case someone can see a way to improve it.

====
create table S(
id integer primary key not null,
value real
);

WITH RECURSIVE
cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<6000008)
insert into S select x as id, iif((x % 2) == 0, 0.0000000123, 12345.000000000) as value
from cnt;

--select * from S limit 5;

select printf("count %d, sum: %35.15f", count(*), sum(value)) from S;
====
sqlite3 < test-sum.sql
====
count 6000008, sum:         37035049380.000150000000000
====

C
====
#include <stdio.h>

int main()
{
#define DSIZE 60000008
static double d[DSIZE];

for (int i = 0; i < DSIZE; ++i)
{
d[i] = (i % 2) == 0 ? 0.0000000123 : 12345.0000000001;
}

double sum = 0.0;
for (int i = 0; i < DSIZE; ++i)
{
sum += d[i];
}

printf("count: %d, sum: %35.15f\n", DSIZE, sum);
return 0;
}
====
./test-sum
====
count: 60000008, sum:        370350049380.000183105468750
====

C with libgmp
====
#include <stdio.h>
#include <gmp.h>

int main()
{
#define DSIZE 60000008
mpf_t sum, a, b;
mpf_init(sum);
mpf_init(a);
mpf_init(b);
mpf_set_d(sum,0.0);
mpf_set_str(a, "0.0000000123", 10);
mpf_set_str(b,"12345.0000000001", 10);

for (int i = 0; i < DSIZE; ++i)
{
mpf_add(sum, sum, (i % 2) == 0 ? a : b);
}

printf("count: %d, sum: ", DSIZE);
mpf_out_str(stdout, 10, 35, sum);
puts("\n");
return 0;
}
====
./test-sum-gmp
====
count: 60000008, sum: 0.37035004938037200005e12
====

GLPK
====
param  dsize := 6000008;
set S := {1..dsize};
param d { i in S} := if ((i mod 2) == 0) then 0.0000000123 else 12345.0000000001;

printf "count: %d, sum: %35.15f\n", card(S), sum {i in S} d[i];
end;
====
glpsol -m test-sum.mod
====
count: 6000008, sum:         37035049380.000160217285156
====
```

### (2) By Igor Tandetnik (itandetnik) on 2020-06-30 14:17:55 in reply to 1 [link]

```The two fragments are not remotely similar. E.g. SQL adds up 6000008 numbers (five zeros), while C adds up 60000008 numbers (six zeros). The "slightly different results" in fact differ by an order of magnitude.
```

### (3.1) By Larry Brasfield (LarryBrasfield) on 2020-06-30 14:22:50 edited from 3.0 in reply to 1 [link]

```(Edited in light of Igor's sharp-eyed observation.)

Once you fix the blunder Igor mentions, and continue to observe imprecision:

In the programming language [Scheme](https://en.wikipedia.org/wiki/Scheme_\(programming_language\)), floating point types are considered "inexact". Among other things, this means that when results are computed in different ways that *appear* to be mathematically equivalent [a], the results may not be equal.

[a. Even though many languages have expression syntax that resembles mathematical notation, they are specifying operations that are only resemble mathematical operator definitions. ]

You may be unaware that on many computers, intermediate terms in expression evaluation can be held with a higher precision than values that are stored. SQLite's execution engine does not exploit that possibility; instead each result of a binary operator is stored without such higher precision. This means that its floating point inexactitude will manifest slightly differently than would be seen where such results are left in and reused from higher precision registers within the floating point arithmetic unit. This is the most likely factor behind the minute discrepancies you have focused upon.

That all said, I see a way to improve matters here:

Given that the requirement is unstated or nonexistent, and accuracy already achieved is typical of a properly operating computer, declare it "Good Enough". That is an honored term of art in engineering disciplines.

If you wish to embark on a quest for unbounded accuracy, there are software packages which do not impose the usual limits on numerical precision. Most of them impose only memory usage limits, but those can be quite expansive today.
```

### (4) By Richard Damon (RichardDamon) on 2020-06-30 14:19:52 in reply to 1 [link]

```One thing to remember that in SQL, you don't know what order the numbers will be added, which may influence your result.

Also, and I don't know if it is  typo, but your SQL is using 12345., while your C is adding a smidgen to it.
```

### (5) By Richard Hipp (drh) on 2020-06-30 16:39:27 in reply to 1 [link]

```The `decimal_sum()` SQL function is available (as an extension) in the
[prerelease snapshot].  That function does unlimited-precision decimal
arithmetic.  It returns an exact answer.

The decimal_sum() function uses more CPU cycles than sum(), obviously.

The decimal_sum() function is built into the CLI in the prerelease snapshot
But in order to use it in your applications, you will have to load it as
an extension.

```

### (6) By Keith Medcalf (kmedcalf) on 2020-06-30 20:01:41 in reply to 1

```If I "fix" you C code so that it mimics what my particular implementation of sum does (which uses a float128 intermediate), the answers are, strangely enough, identical.  Be careful when comparing a "two baskets of fruit" that your baskets are the same size, and that you "sum up" your apples and oranges in the same order.

```
sqlite> create table S(
...> id integer primary key not null,
...> value real
...> );
sqlite>
sqlite> WITH RECURSIVE
...>   cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<6000008)
...> insert into S select x as id, iif((x % 2) == 0, 0.0000000123, 12345.000000000) as value
...> from cnt;
sqlite>
sqlite> --select * from S limit 5;
sqlite>
sqlite> select printf('count %d, sum: %!35.6f', count(*), sum(value)) from S;
┌─────────────────────────────────────────────────────────┐
│ printf('count %d, sum: %!35.6f', count(*), sum(value))  │
├─────────────────────────────────────────────────────────┤
│ count 6000008, sum:                  37035049380.036903 │
└─────────────────────────────────────────────────────────┘
sqlite> ^Z

#include <stdio.h>

int main()
{
#define DSIZE 6000008
static double d[DSIZE];

for (int i = 0; i < DSIZE; i++)
{
d[i] = (i % 2) == 1 ? 0.0000000123 : 12345.000000000;
}

__float128 sum = 0.0;
for (int i = 0; i < DSIZE; i++)
{
sum += d[i];
}
printf("count: %d, sum: %35.6f\n", DSIZE, (double)sum);
return 0;
}

count: 6000008, sum:                  37035049380.036903
```

So I would presume that the result achieved in non-extended (ie, strict double) would be the same as well.

```
#include <stdio.h>

int main()
{
#define DSIZE 6000008
static double d[DSIZE];

for (int i = 0; i < DSIZE; i++)
{
d[i] = (i % 2) == 1 ? 0.0000000123 : 12345.000000000;
}

double sum = 0.0;
for (int i = 0; i < DSIZE; i++)
{
sum += d[i];
}
printf("count: %d, sum: %35.6f\n", DSIZE, (double)sum);
return 0;
}

count: 6000008, sum:                  37035049380.000153
```

And shiver me timbers, they are!  (Except that your original did not have the necessary ! to do "extended formatting" of the "dodgy bits" of floating point)

Note also that Richard's "decimal_sum" does indeed arrive at the *exact* answer, though it is mucho slower than the floating point methods.

```
sqlite> create table S(
...> id integer primary key not null,
...> value real
...> );
sqlite>
sqlite> WITH RECURSIVE
...>   cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<6000008)
...> insert into S select x as id, iif((x % 2) == 0, 0.0000000123, 12345.000000000) as value
...> from cnt;
sqlite>
sqlite> --select * from S limit 5;
sqlite> .timer on
sqlite>
sqlite> select printf('count %d, sum: %!35.6f', count(*), sum(value)) from S;
┌─────────────────────────────────────────────────────────┐
│ printf('count %d, sum: %!35.6f', count(*), sum(value))  │
├─────────────────────────────────────────────────────────┤
│ count 6000008, sum:                  37035049380.036903 │
└─────────────────────────────────────────────────────────┘
Run Time: real 0.627 user 0.625000 sys 0.000000
sqlite> select decimal_sum(value) from S;
┌────────────────────────┐
│   decimal_sum(value)   │
├────────────────────────┤
│ 37035049380.0369000492 │
└────────────────────────┘
Run Time: real 9.944 user 9.937500 sys 0.000000
sqlite>
```

You will note that the version using float128 intermediates is accurate to 15 significant digits, the version using extended precision (float64x or 80-bit) is accurate to 14 digits, and the one using strictly doubles to 13 digits, even though it should, on average, be correct to 1 ULP or 1e-5.  There is quite a lot of error introduced because you did not "sort" the numbers before summing them.
```

```Thank you for catch out my mistake !
But even after fixing it the differences remain.
```

```Thank you to catch another mistake done by me.
Now the results get closer.
```

```Thank you for pointing out it !
It takes nearly twice in processing time and gives the result very close to libgmp.

=====
/usr/bin/time sqlite3 < sum-test.sql
count 6000008, sum:         37035049380.000160000000000
3.22user 0.02system 0:03.25elapsed 99%CPU (0avgtext+0avgdata 115324maxresident)k

=====
/usr/bin/time sqlite3 < sum-decimal-test.sql
count 6000008, sum:         37035049380.037200000000000
5.91user 0.02system 0:05.94elapsed 99%CPU (0avgtext+0avgdata 115428maxresident)k
0inputs+0outputs (0major+28063minor)pagefaults 0swaps

=====
/usr/bin/time ./sum-test-gmp
count: 6000008, sum: 0.370350493800372000496e11

0.13user 0.00system 0:00.13elapsed 100%CPU (0avgtext+0avgdata 1920maxresident)k
0inputs+0outputs (0major+72minor)pagefaults 0swaps
=====
```

```Could you share your particular implementation of sum using float128 ?
```

```After looking at func.c and changing the structure SumCtx to use LONGDOUBLE_TYPE as used in printf.c the result indeed improve,but it goes a bit over the result of decimal_sum (after fixing my mistakes on the numbers, and the alternation on the big/small numbers are intentional for the testing):
=====
create table S(
id integer primary key not null,
value real
);

WITH RECURSIVE
cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<6000008)
insert into S select x as id, iif((x % 2) == 0, 0.0000000123, 12345.0000000001) as value
from cnt;

--select * from S limit 5;

--select printf("count %d, sum: %35.15f", count(*), decimal_sum(value)) from S;
select printf("count %d, sum: %35.15f", count(*), sum(value)) from S;

=====
/*
** An instance of the following structure holds the context of a
** sum() or avg() aggregate computation.
*/
typedef struct SumCtx SumCtx;
struct SumCtx {
LONGDOUBLE_TYPE rSum;      /* Floating point sum */
i64 iSum;         /* Integer sum */
i64 cnt;          /* Number of elements summed */
u8 overflow;      /* True if integer overflow seen */
u8 approx;        /* True if non-integer value was input to the sum */
};
=====

=====
/usr/bin/time sqlite3 < sum-long-long-test.sql
count 6000008, sum:         37035049380.037510000000000
3.27user 0.01system 0:03.29elapsed 100%CPU (0avgtext+0avgdata 115292maxresident)k
0inputs+0outputs (0major+28059minor)pagefaults 0swaps
=====
/usr/bin/time sqlite3 < sum-decimal-test.sql
count 6000008, sum:         37035049380.037200000000000
5.91user 0.02system 0:05.94elapsed 99%CPU (0avgtext+0avgdata 115428maxresident)k
0inputs+0outputs (0major+28063minor)pagefaults 0swaps
=====
/usr/bin/time sqlite3 < sum-default-test.sql
count 6000008, sum:         37035049380.000160000000000
3.22user 0.02system 0:03.25elapsed 100%CPU (0avgtext+0avgdata 115432maxresident)k
0inputs+0outputs (0major+28065minor)pagefaults 0swaps
=====
```

### (12) By Keith Medcalf (kmedcalf) on 2020-07-01 10:13:00 in reply to 11 [link]

```You made the change to func.c to use the LONGDOUBLE_TYPE instead of double in func.c

If LONGDOUBLE_TYPE is not defined then it is set `long double` so the easiest way is just to make sure the compiler is passed `-DLONGDOUBLE_TYPE=__float128` to override that default on the command line that builds the executable.

I build on WIndows, so I use MSVC and makefile.msc to make the amalgamation, then have all the options I want to use in config.h (including the define for LONGDOUBLE_TYPE) and set `-D_HAVE_SQLITE_CONFIG_H` on the gcc command line (the GCC build doesn't use the makefile).
```

### (13) By Keith Medcalf (kmedcalf) on 2020-07-01 10:14:58 in reply to 12 [link]

```You could also theoretically just change the type in SumCtx to be __float128 instead of double or LONGDOUBLE_TYPE
```

```Thank you for the tip !