IEEE 754 canonicalization in SQLite

(1) By Rico Mariani (rmariani) on 2022-02-28 19:50:03 [link] [source]

As I read the docs, SQLite canonicalizes floating point storage to IEEE 754, 8 bytes format, so that the DB is portable across architectures. Presumably it does the reverse as well.

Anyone know off-hand where that code is in the source? I need to do the same thing and I'd rather not do it from scratch... I don't there's an official API for this but I could excise it.

(2) By Richard Hipp (drh) on 2022-02-28 19:54:59 in reply to 1 [link] [source]

SQLite assumes that the CPU does this for it. In other words, it requires a CPU that support IEEE754 doubles. No coding/decoding is involved, other than byte reversal so that the floating point values stored on disk are always big-endian.

(3) By Rico Mariani (rmariani) on 2022-02-28 21:07:35 in reply to 2 [link] [source]

Wow... and you just go with big endian. Color me surprised. I figured that you'd have to support an ABI other than IEEE754 so you'd need a portable format. I'm obviously overthinking this. My code doesn't need to be more portable than SQLite that's for sure.

Thanks muchly.

(5) By JayKreibich (jkreibich) on 2022-03-01 01:06:09 in reply to 3 [link] [source]

Is there a modern architecture (i.e. last 30 years) that has native FP in something other than IEEE 754? As far as I know, even GPUs that do FP use 754 (except maybe some of the compact 16-bit formats).

-j

(6) By Rico Mariani (rmariani) on 2022-03-01 02:14:14 in reply to 5 [source]

Apparently there isn't... I figured something would be but it seems not. This is a good thing. I was prepared to use hexfloat...

(7) By Bill Wade (billwade) on 2022-03-01 19:41:42 in reply to 5 [link] [source]

... (i.e. last 30 years) ...

I believe that micro-vax machines shipped up until about y2k, and that they used VAX floating point (not IEEE).

Whether or not that is "modern" is debatable.

https://en.wikipedia.org/wiki/Pentium_FDIV_bug was discovered in '94. It is more likely to qualify as "modern". It used IEEE format, but got wrong answers for some operands. Perhaps "some wrong answers" means "not IEEE?"

I don't know which math functions are considered deterministic by sqlite. I'm sure that the value returned by sin(pi()*1e100) is wrong, in some sense (in part because the value returned by pi() is not exact). I don't know if it is consistent across machines.

(8.1) By Keith Medcalf (kmedcalf) on 2022-03-01 20:20:39 edited from 8.0 in reply to 7 [link] [source]

I'm sure that the value returned by sin(pi()*1e100) is wrong

Of course it is. There are only tau (2*pi) radians in a circle, so an angle of 1e100*pi is pretty wild. My slide rule gets the correct answer because I know that sin(1e100*pi()) is the same as sin(2*pi()) and therefore the "correct" answer is easily computable.

However, I doubt that the argument of trignometric functions is "scaled" to reasonableness (it is not part of the specfication), and even if it were, you are dealling with binary floating point approximations here, not symbolic mathematics.

(9) By JayKreibich (jkreibich) on 2022-03-01 21:38:07 in reply to 7 [link] [source]

I believe that micro-vax machines shipped up until about y2k, and that they used VAX floating point (not IEEE).

I meant designed or introduced in the last 30 years, not "was available for sale." 30 years might be close to forever in this business, but architectures last even longer.

IMHO, that's still pretty impressive, given that the IEEE 754 standard is less than 40 years old. It was embraced very quickly.

Perhaps "some wrong answers" means "not IEEE?"

No, it means there was a bug. The design intent was stated and specific. Most implementations are not to-the-letter compliant-- often intentionally so, in a desire to trade edge-cases for speed. The Pentium issue was just particularly bad because it wasn't exactly an edge-case, and it wasn't intentional.

Besides, the way I read it, the context of this conversation was focused on the cross-platform aspects of the data storage format, not the nuances of the calculations themselves. Given that most FPUs have a number of mode flags, many of which make the FPU non-compliant, that turns into a big ball of mud pretty quickly. While there are aspects of computing where those differences are very relevant, if you want to go down that path, you must also deal with the fact that there are three major revisions of the IEEE-754 standard (1985, 2008, 2019), and many sub-modes within the standard. It gets very complex very quickly. And at the end of the day, it likely doesn't matter, since SQLite doesn't have code or the ability to perform such calculations unless you link an external math or function library.

In terms of the original question, the core answer is that SQLite completely depends on the underlying hardware for data manipulation, conversion, and calculations when it comes to floating-point data values, their format, and their storage-- but that's OK, because basically "all" hardware is some flavor of IEEE 754.

(4) By David Jones (vman59) on 2022-02-28 22:46:56 in reply to 2 [link] [source]

There's nothing stopping a compiler and associated run-time from implementing IEEE754 doubles in software, which I believe was done for some low end CPU's. SQLite can just assume the results are correct IEEE numbers.