4-byte float possible in SQLite?
(1) By anonymous on 2022-01-31 16:30:23 [link] [source]
New to SQLite. The following documentation is relevant: https://www.sqlite.org/floatingpoint.html https://www.sqlite.org/datatype3.html Both imply, but neither state explicitly, that a floating point number (say 3.14159) will be stored in a REAL column as an 8-byte value only, even if the developer intends a 4-byte float (i.e. System.Single in C#.NET or float in C++). There is some confusion with the description of INTEGER columns in that the value is described as being "0, 1, 2, 3, 4, 6, or 8 bytes". Is there some flexibility with how INTEGER values are stored, but no flexibility with REAL values? Could someone please clarify: - Are SQLite REAL values *always* only 64-bit (mapping to System.Double in C#.NET or double in C++)? - Is there any way to coerce SQLite to save a number as 32-bit (mapping to System.Single or float)? [Note: I understand it may be desirable to always save REAL numbers in an unambiguous 64-bit format...just trying to learn] Thanks, Alain
(2) By Warren Young (wyoung) on 2022-01-31 16:40:43 in reply to 1 [link] [source]
With integers, SQLite can look at the value's magnitude to determine how many bytes it needs to use to store it faithfully. You can’t do that with FP. You only get to decide how much precision to keep, which isn’t encoded in the value anywhere.
Pi is a particularly bad example. There’s infinite precision there. How much should SQLite try to store?
I realize you’re asking for a type of prior restraint. SQLite doesn’t do that. Its type system is closer to a dynamic language like Tcl than to a static one like C#.
(3) By alain (abdekker) on 2022-01-31 17:02:09 in reply to 2 [source]
Thanks Warren. This question is unrelated to the nifty (but mathematically impossible) feat of storing an infinite amount of data in a finite storage space. If anyone thinks the "3.14159" example was unfortunate because it triggers a "That's PI and he's asking how to break the laws of physics!" bias in some readers, please substitute "1.23" instead. This question is only about SQLite floating point values (where the developers' intention might be to save precisely 4-bytes). Is there any flexibility there? The documentation (perhaps strongly!) implies that REAL values are always saved as 64-bit only. If that is the case, the documentation might be improved by explicitly dealing with the "4-byte float" issue (both in storage and retrieval).
(5.2) By Warren Young (wyoung) on 2022-01-31 17:32:10 edited from 5.1 in reply to 3 [link] [source]
please substitute "1.23"
Okay:
$ sqlite3 ':memory:' \
'.mode list' \
'create table a(a)' \
'insert into a values(1.23)' \
'select printf("%.15f", a) from a'
$ 1.229999999999999
So, how many of those nines did you want SQLite to magically figure out it should store instead of the 13 it's doing at the moment? Why is that value obviously correct?
EDIT: I do realize that some SQL database servers let the DBA answer that question at table creation time. My point about "no prior restraint" is that SQLite is not one of these. It stores all the bits it has. With integers, it's able to figure out a way to save unnecessary bytes, but with FP, it would have to have a feature that made you force it to discard what you, the DBA, consider "unnecessary." Without that feature, it will not guess.
(6) By Larry Brasfield (larrybr) on 2022-01-31 17:48:59 in reply to 5.2 [link] [source]
Theoretically, (to be distinguished from practically), noting the number of trailing mantissa zero bits could be stored more compactly, and small exponents, perhaps centered around the exponent for 1.0, could also be stored more compactly. This would allow floats that are the product of a small integer and an integer power of 2 to be "efficiently" stored.
If such a feature existed in anything I use, I would hope a built-time option existed to remove it.
That all said, doubles are valued mainly for computation, and not quite so much for representing quantities in the real world. It is common to have no more than 6 (decimal) digits of precision in physical quantities that have been measured. In my opinion, it would have been sensible for SQLite to support a smaller float along with its present double.
(4) By Larry Brasfield (larrybr) on 2022-01-31 17:03:55 in reply to 1 [link] [source]
Both imply, but neither state explicitly, that a floating point number (say 3.14159) will be stored in a REAL column as an 8-byte value only,
Please look at the middle bullet item in section 2 of the doc you cited and get back if that contention survives your careful reading.
... even if the developer intends a 4-byte float (i.e. System.Single in C#.NET or float in C++).
Taking off from the thread title: I do not think that it is possible for the designation of a smaller stored FP value to be keyed off of developer intent.
There is some confusion with the description of INTEGER columns in that the value is described as being "0, 1, 2, 3, 4, 6, or 8 bytes".
There is some usage of the passive voice. I do not believe that the doc is unclear at all, particularly since section 2 has a link just like this: "This makes a difference on disk". I am prepared to believe that an overly quick reading might leave some confusion, but that would not survive a diligent study of the clearly documented record format by an ordinary developer.
Is there some flexibility with how INTEGER values are stored, but no flexibility with REAL values?
The only "flexibility" is in choosing the integer value to be stored. The storage taken depends only on that, for integers. However, when REAL values are stored into either an untyped or REAL column, C doubles are stored, all 8 bytes of them.
Could someone please clarify:
- Are SQLite REAL values always only 64-bit (mapping to System.Double in C#.NET or double in C++)?
- Is there any way to coerce SQLite to save a number as 32-bit (mapping to System.Single or float)?
Yes. Yes. Take a look at the BLOB type. If you use it, you will not be able to exploit SQLite's ability to do arithmetic on values coming out of the DB, and you will need to do some modest type punning in C/C++ (or whatever language-dejour) and worry about endianess for values going into or coming out of the DB, but you can squeeze your FP values down to 4 bytes without rewriting part(s) of SQLite or waiting for it to gain a new datatype.
... Note: I understand it may be desirable ...
Good. We need not have that discussion.
(7.1) By alain (abdekker) on 2022-02-10 15:40:20 edited from 7.0 in reply to 4 [link] [source]
Thanks Larry. Your answer is clear enough: "No flexibility for FP numbers which are always stored as an 8-byte floating point number (a C double)".
The link to the record format (https://www.sqlite.org/fileformat2.html#record_format) was useful since it shows that there is no flexibility in the serial type to account for anything other than 64-bit FP numbers.
IMO, the data types document (https://www.sqlite.org/datatype3.html) might be slightly improved by the addition of "only" at the end of this sentence in section 2:
"REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number. No other floating storage class is used in SQLite."
(8) By Kees Nuyt (knu) on 2022-02-01 16:37:36 in reply to 7.0 [link] [source]
by the addition of "only" at the end of this sentence in section 2: "REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number [only]."
That would not be correct. In 3. Type Affinity
it states an exception:
If a floating point value that can be represented exactly as an integer is inserted into a column with NUMERIC affinity, the value is converted into an integer.
--
Regards,
Kees Nuyt
(10) By alain (abdekker) on 2022-02-01 16:48:05 in reply to 8 [link] [source]
Thanks Kees. Yes, that's true. So the update to the documentation in section 2 would need to be more careful. It is currently incomplete and ambiguous.
Maybe something like the following?
"REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number only. Section 3 below discusses an exception where the value can be represented exactly as an integer."
(12) By Larry Brasfield (larrybr) on 2022-02-01 17:01:52 in reply to 10 [link] [source]
(Answering Kees also.)
No, that documentation bullet item is not incorrect or at odds with library behavior. The language in question is not about affinity and what will be stored under various update/insert scenarios. It is about what is stored in the DB, as can be determined by examination of its binary record data.
You may note that "NUMERIC" is not among those 5 bullet items, but is an indispensable part of the point Kees is making.
(15) By Larry Brasfield (larrybr) on 2022-02-01 17:58:34 in reply to 8 [link] [source]
In
3. Type Affinity
it states an exception:
Kees, since you appear to be confounding "type affinity" with "storage class", in order to have viewed that "exception" as an exception, I wonder if these two concepts have not been made clear enough yet. There is much documentation covering these terms already, and so I hesitate to add to that. But if something about it leads to confusion between them, or if a summary or glossary is needed to help distinguish them, that is certainly worth considering.
Do you have any suggestions? (I ask because your contributions are typically well informed and apt.)
(25) By Kees Nuyt (knu) on 2022-02-03 11:46:22 in reply to 8 [link] [source]
My previous contribution was not accurate, I mixed up storage class and type affinity.
The docs are clear, the five storage classes are mutually exclusive. Type affinity determines which storage class will be used, the declared column type is an important factor in that.
I apologize for any confusion this may have caused.
--
Regards,
Kees Nuyt
Lord please help me to keep my big mouth shut until I know what I am talking about.
(9) By Larry Brasfield (larrybr) on 2022-02-01 16:44:19 in reply to 7.0 [link] [source]
IMO, the data types document (https://www.sqlite.org/datatype3.html) might be slightly improved by the addition of "only" at the end of this sentence in section 2:
"REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number [only]."
I am disinclined to add such pleonastic surplusage. (And to perhaps avoid seeming overly abrupt on this:) Here is why: When language is added to something that is already clear and unambiguous, it has several undesirable effects. One is that it takes extra time to read, (which is admittedly a small cost for a single word.) Another is that, because a common language pattern is {general assertion} followed by {limiting or qualifying clause(s)}, the addition of something that has the syntactic appearance of a qualifier but which alters the preceding assertion in no way is likely to lead to reader distraction as they waste brain cycles wondering if the qualifier is somehow subtly altering the meaning. Then, when or if a reader conversant with the language realizes that the addition is utterly redundant, they wonder why the author does that. Finally, such practice weakens a proposition that we try to ensure is true across the documentation, which is: Unqualified and clear statements of what happens X, under Y circumstances, are definitive such that it would be a bug (or documentation defect) if anything other than X ever does happen under Y circumstances.
If ever a 4-byte FP value were to be stored where a REAL was supposed to be, it would be clearly incorrect, in either the doc or the code, for the assertion "REAL is stored as an 8-byte number" to coexist with that behavior. This would be equally true if "only" was added (or removed) from the assertion. Hence, the word would add nothing.
Anybody who spots obviousa and uselessa redundancy in the docs is welcome to point it out, whereupon it will be treated similarly to spelling and grammar errors which are routinely corrected when noticed.
a. There is redundancy in the docs which is not obvious and is useful, typically in separated contexts. That is not a candidate for summary deletion.
(11.1) By alain (abdekker) on 2022-02-01 16:52:04 edited from 11.0 in reply to 9 [link] [source]
Thanks Larry. Its not unreasonable to be conservative, but Kees points out that the documentation is wrong. Suggesting a minor tweak, perhaps by someone who understands what "pleonastic surplusage" means, is hardly obtuse.
(13) By Larry Brasfield (larrybr) on 2022-02-01 17:03:53 in reply to 11.1 [link] [source]
The doc is not wrong. And I do not suggest that surplusage would be obtuse.
(14) By Ryan Smith (cuz) on 2022-02-01 17:10:31 in reply to 11.1 [link] [source]
A minor tweak, perhaps by someone who understands what "pleonastic surplusage" means, is reasonable.
To explain what what Larry said, minus the big words: If you sold Red Apples, you can make a sign that says "Red Apples for sale", it is quite adequate and people understand what it means.
It is unreasonable for shoppers to insist you add "and not Green apples" or even the simpler, but equally surplus, "Only".
What is more, if you do decide to add it to the sign to read, say "Red Apples for Sale Only" or perhaps "Only Red Apples for Sale" - you can see that it does not simplify the statement at all - indeed, it adds nothing but more questions.
Also, REALs are never stored as anything but 8-Byte floats, Kees' reference was to NUMERIC items which is a different colour horse and has adequate documentation to the effect in its own section.
(16) By alain (abdekker) on 2022-02-01 18:05:47 in reply to 14 [link] [source]
"Red Apples for sale" is a great example, thanks. Fair enough on the addition of "only" - it was only a suggestion.
In many places, the SQLite documentation makes it a virtue to flexibly choose the raw (binary) format of the data stored. Column affinity is used as a hint only, but SQLite stores pretty much any value (except for INTEGER PRIMARY KEY and BLOB columns).
Given that, this developer thinks the statement ambiguous. There may be others who think so too, even if most do not.
Explicitly excluding 4-byte floats just doesn't seem unreasonable. How that might be done, if at all, is down to the doc maintainers.
Thanks for the pointers and for answering the question: Only 8-byte floats are saved/retrieved from REAL fields regardless of the input.
(17) By Larry Brasfield (larrybr) on 2022-02-01 18:22:25 in reply to 16 [link] [source]
Given that, this developer thinks the statement ambiguous.
By "the statement", I presume from context that you refer to this:
REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
What ambiguity do you see? Can you find a way to parse that language such that storing anything other than "an 8-byte IEEE floating point number" is consistent with its meaning? I am unable to find any reasonable interpretationa, (let alone likely interpretation), which admits an alternate object stored as REAL.
a. I exclude interpretations which might involve variation among versions of the "IEEE 754" standard, as the distinction is irrelevant in this context.
(19.2) By alain (abdekker) on 2022-02-03 17:04:14 edited from 19.1 in reply to 17 [link] [source]
You really have to "win" this argument, don't you? That you are unable to accept that someone else sees ambiguity might have given you pause to reflect. Language is such a complex thing. Just let it drop.
The SQLite documentation makes it clear that, where possible, only the data required is saved. For example, the string "fred" only occupies 4-bytes (plus the standard record header). The question was whether floating point (i.e. REAL) numbers are ever, or can ever be coerced to be, saved and retrieved in anything other than a 8-byte C double. The most obvious candidate would be the IEEE 4-byte C float. The answer is no which the documentation implied, but did not state.
Ciao.
(21) By Larry Brasfield (larrybr) on 2022-02-02 18:15:26 in reply to 19.0 [link] [source]
I don't have to "win", fortunately. I like to win, but in a different way than you seem to imagine. To me, a "win" would be to learn something, or to help somebody else learn something. To the minimal extent this has been an argument, I see that aspect of this thread as a net loss.
It's true that I see no ambiguity where you apparently do. I'm willing to see it, but will need some concrete assistance to do so. An alternative interpretation which is still consistent with the allegedly ambiguous statement would be enough. I gather that such is not forthcoming.
The question was whether floating point (i.e. REAL) numbers are ever, or can ever be coerced to be, saved and retrieved in anything other than a 8-byte C double.
On "Storage Classes", the doc says, "Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:", followed by a list of 5 items. With that clear, unambiguous phrasing, the list is exhaustive; there are no other storage classes. If there were, that leading sentence would be false. (And without the "Each", it would be misleading and in need of an edit.)
[Regarding whether a 4-byte float is among the stored datatypes]
... The answer is no, and the documentation implied, but did not state this.
If there is such an implication to be found in the documentation, made or encouraged by some small number of sentences (rather than a gestalt of the whole doc set or a loose impression formed from cursory study of the docs), then I will happily root it out. I do not presently know where or how such an implication is made, and I cannot act upon mere assertion.
(22) By Tim Streater (Clothears) on 2022-02-02 19:19:25 in reply to 19.0 [link] [source]
You said: The question was whether floating point (i.e. REAL) numbers are ever, or can ever be coerced to be, saved and retrieved in anything other than a 8-byte C double. The most obvious candidate would be the IEEE 4-byte C float. The answer is no, and the documentation implied, but did not state this. It didn't need to, any more than it has no need to state that SQLite does not store 773-byte floats. This whole thread has become a waste of time and energy, and, based on certain replies, I am minded to wonder what certain posters have been smoking. Those posters, Larry is slowly grinding down, and it is to be hoped they will give up.
(24) By JayKreibich (jkreibich) on 2022-02-02 20:58:56 in reply to 22 [link] [source]
This whole thread has become a waste of time and energy, and, based on certain replies, I am minded to wonder what certain posters have been smoking. Those posters, Larry is slowly grinding down, and it is to be hoped they will give up.
These are the kinds of discussions when you can pretty much tell who has a formal computer science education, and who simply "learned programming." I remember taking a whole computational algebra class that was basically "IEEE-754 numbers, and why nothing you learned in math class applies." It's one of those lines between "programming" and "software engineering," and I've seen it show up countless times-- usually in very bad, and/or very expensive, ways.
-j
(26) By alain (abdekker) on 2022-02-10 15:37:49 in reply to 22 [link] [source]
773-byte floats sounds like a dumb idea, perhaps provide a reference?
Something like "No other storage class is possible in SQLite" covers all possibilities - you wouldn't need to mention 4-byte floats, nor any other more exotic float either.
(23.1) By Warren Young (wyoung) on 2022-02-02 20:12:46 edited from 23.0 in reply to 19.0 [link] [source]
SQLite makes a virtue of saving only the data required.
You're ascribing a piece of software intelligent agency.
I tried to get this across to you with the "1.23" example above, but I seem to have missed my mark, so allow me to try again.
If you try to store an integer "1" in SQLite, it sees you've given it only one significant bit, so it stores just that one bit in the file. (This is the 0-byte case you quoted above.)
If you try to store a "3", it sees two significant bits, so it stores the value in a single byte, having no clever trick for storing two bits as it does for one bit.
If you give it a string "1.23" in a SQL query and it decides to treat it as a REAL, it gets 64 bits of data, every one of which is significant to the software; it can’t make a leap to throw away "insignificant" bits without a programmer sitting there making a conscious choice to force it.
There is no parallel between SQLite intelligently using 0, 1, 2, 3, 4, 6, or 8 bytes to store integers as with REAL values, short of Larry's trick up-thread of looking at 0 bits in the mantissa and exponent, which in any case wouldn't be what you want anyway.
The only option would be for SQLite to add a hard-coded fixed-size data type, which is completely against its dynamic, flexible nature.
We aren't saying what you want cannot be done. We're saying it isn't done now for good reason, one that isn't likely to change. There's a difference.
(18) By Keith Medcalf (kmedcalf) on 2022-02-01 20:58:14 in reply to 1 [link] [source]
You will note that although the API of SQLite3 and its internal mechanisms will only store floating point values as IEEE-754 64-bit binary floating point, there is nothing preventing you from storing and working with the values in your application using 32-bit binary floating point (or whatever format your little heart desires).
The 32-bit "application value" will merely be extended to 64-bits when passed into SQLite3 via the API, and the 64-bit stored value will be rounded down to 32-bits on retrieval.
(20) By alain (abdekker) on 2022-02-02 10:00:03 in reply to 18 [link] [source]
The existing application code below (in C#) works with another database (Microsoft Access in my case) which has both 4-byte and 8-byte data types: // Code omitted to actually extract the DataSet using IDbConnection, IDbCommand and IDbDataAdapter::Fill Dataset ds = GetDataSet(); float myFloat = (float)ds.Tables[0].Rows[0]["FloatField"]; double myDouble = (double)ds.Tables[0].Rows[0]["DoubleField"]; But change the database to SQLite and the application now throws an exception. The static cast for "float" needs to be converted to: float myFloat = Convert.ToSingle(ds.Tables[0].Rows[0]["FloatField"]); The "double" static cast continues to work because SQLite returns the data mapped to an object of type System.Double. This can have a considerable knock-on effect depending on how the original code was written or the relative performance of static cast compared with the "Convert.ToX" methods (which tend to be around 3-5 times slower).