Proposed slightly-incompatible change to date-time functions
(1) By Richard Hipp (drh) on 2021-11-19 13:07:44 [link]
As you know, SQLite currently supports three different representations for date/time values: 1. **TEXT** → *YYYY-MM-DD HH:MM:SS.SSS* 2. **REAL** → The [Julian day number], the number of days since noon (in Greenwich) on -4714-11-24, according to the proleptic Gregorian calendar. 3. **INT** → Unix-time or the number of seconds since 1970-01-01 00:00:00. : https://en.wikipedia.org/wiki/Julian_day In all cases, SQLite supports date ranges from 0000-01-01 00:00:00 through 9999-12-31 23:59:59. For the various date/time functions (`date()`, `datetime()`, `time()`, `julianday()`, `strftime()`) the date/time argument can be either TEXT or REAL and everything just works. The date/time argument can also be INT, but such INT values will still be interpreted as a Julian day number unless followed by the `'unixepoch'` modifier. ## Proposed change If the date/time argument is an INT value that is not within the range of 1721060..5373483, then it is automatically interpreted as a Unix timestamp, without needing the `'unixepoch'` modifier. The `'unixepoch'` modifier can still be used, but it would be redundant in this case. ## Benefits With this enhancement, queries could invoke SQL functions like `datetime()` on a column of type DATETIME without having to worry about the underlying storage format. The conversion would be completely automatic. Indeed, columns that are intended to store date/time values could hold a mixture of TEXT, REAL, and INT values, and queries using the date/time functions would always be able to interpret the values correctly. ## Possible Breakage SQLite only defines date/time values for 0000-01-01 through 9999-12-31. So for that date range, I don't see any way to mess this up. There is an ambiguity for numeric inputs - do we interpret them as Julian day or Unix timestamps - but this ambiguity only occurs for a limited range of values between 1970-01-20 22:04:19 and 1970-03-04 04:38:03. If you have unix timestamp value in that range, and you pass them into a datetime function as REAL numbers instead of as integers, they will get reinterpreted as Julian day numbers unless you use the `'unixepoch'` modifier. Perhaps a bigger problem would be Julian day numbers that are passed into into date/time functions as integers. That is only possible for dates that correspond to noon UTC. Such integer Julian day numbers would be interpreted as unix timestamps and transformed into date/times in vicinity of 1970-01-29. We could mitigate this by adding a new `'julianday'` modifier that forces the date/time argument to be interpreted as a Julian day number even if it is an integer. ## What do you think? Is the benefit of this change worth the small chance of breaking legacy applications?
(2) By Richard Damon (RichardDamon) on 2021-11-19 13:31:14 in reply to 1 [link]
One comment is that having Julian Day numbers as just an integer might not be as rare as you think, as if something is only storing the DATE something happened, and not a full Date/Time then that makes a lot of sense, and for historical data, that often makes sense as you may not really have information of the precision to add a time to the event. Also, I think SQLite could take times that just happen to be exactly at noon, and thus equal to an integer (and of reasonable value) and store it in the database as an integer as a form of compression, so just because the column value happens to be stored as an integer, doesn't mean that an integer value was sent to the database, it could have been a real value that just happened to have no fractional bits present. I think this may be a function of the affinity of the column, but we don't know the affinity of existing legacy date columns Maybe adding an 'Auto' modifier that makes the function decide format based on the type of the input would avoid the backwards compatibility break.
(3) By Richard Hipp (drh) on 2021-11-19 15:49:49 in reply to 2 [link]
> SQLite could take times that just happen to be exactly at noon, and thus equal to an integer, and store it in the database as an integer as a form of compression SQLite does indeed do this. But it also converts the integer back into a floating point number as it reads it back off the disk, so by the time it gets to the date/time SQL functions, it is a REAL again. > Maybe adding an 'Auto' modifier That's a good idea. I'll consider that. Recognize that all of this discussion is for 3.38.0 or later. This is not a change that is being considered for the upcoming 3.37.0 release.
(4) By Richard Damon (RichardDamon) on 2021-11-20 02:49:21 in reply to 3 [link]
>>> Indeed, columns that are intended to store date/time values could hold a mixture of TEXT, REAL, and INT values, and queries using the date/time functions would always be able to interpret the values correctly. >> SQLite could take times that just happen to be exactly at noon, and thus equal to an integer, and store it in the database as an integer as a form of compression > SQLite does indeed do this. But it also converts the integer back into a floating point number as it reads it back off the disk, so by the time it gets to the date/time SQL functions, it is a REAL again. The point I am making is that because it does this, you can't tell if the original was inserted as an integer or as a real if the affinity of the column allows this transform. Thus what seems like a small change in the schema for a table could drastically change the meaning of data that is sent to it. This is not saying that the idea is inherently bad, but it is worth some though, and careful documentation that includes comments about how the affinity of a column used with these functions might affect the interpretation of that data.(some affinities will NEVER have their values be interpreted as Unix Timestamps because they will be converted to REAL when read.
(5) By Richard Hipp (drh) on 2021-11-20 02:58:44 in reply to 4 [link]
> The point I am making is that because it does this, you can't tell if the original was inserted as an integer or as a real if the affinity of the column allows this transform. If the affinity is INTEGER, then you can insert 22 or 22.0 and you will always get back 22. If the affinity is REAL, you can insert 22 or 22.0 and you will always get back 22.0. But if the affinity is NUMERIC or BLOB then you always get back what you inserted. If you insert 22 you get back 22, and if you insert 22.0 you get back 22.0.
(6) By Richard Damon (RichardDamon) on 2021-11-20 13:49:02 in reply to 5 [link]
> If the affinity is INTEGER, then you can insert 22 or 22.0 and you will always get back 22. If the affinity is REAL, you can insert 22 or 22.0 and you will always get back 22.0. But if the affinity is NUMERIC or BLOB then you always get back what you inserted. If you insert 22 you get back 22, and if you insert 22.0 you get back 22.0. And the issue is that when looked in the lens of backward compatibility, a column declared 'DATE' will be NUMERIC, and if the application ONLY dealt with DATE (and not DATE-TIME) values it would have been very reasonable to work with integer Julian Day values, as they are much smaller in the database (and in the program) than REALS. So, changing the default to make this data be seen as Unix Timestamps would be a major breaking change to that class of applications. SQLite has a history of refusing to make some more obvious fixes of removing clearly incorrect behavior just because some existing applications depended on that incorrect behavior, and it seems out of place to break existing applications that actually have fully intended behavior. Adding a modifier to make it do that sort of behavior makes sense. Even adding a pragme to change the behavior so that modifier is the default (which might imply the need for a 'jday' or 'julian' modifier to get back to the current default) makes sense, as this doesn't break existing programs Maybe even adding a way to make a persistent change in the default for a given database (which if used might make the database unusable by earlier versions that don't understand the persistent default change) could make sense.
(7) By Keith Medcalf (kmedcalf) on 2021-11-21 02:42:30 in reply to 1 [link]
So as I see it **CURRENT SITUATION**: A 'text' argument is interpreted as an ISO-8601 subset datetime string. A 'numeric' argument is interpreted as a Julian Date number. The 'unixepoch' modifier forces the 'numeric' to be interpreted as a unixepoch timestamp rather than a Julian Date number. \*\* numeric means a numeric type, either integer or real Interpretation: Numeric values are always interpreted as Julian Date numbers UNLESS the 'unixepoch' modifier is specified. **PROPOSAL** Introduce "magical behaviour" which alters how a 'numeric' argument is interpreted depeding on its presentement type *and* its value range. **RAMIFICATIONS** Numeric arguments are no longer interpreted as Julian Date numbers. A new modifier must be introduced to restore the previous behaviour. All existing code that relies on the old behaviour must be modified so as to pass the new modifier string that the argument is to be interpreted as a Julian Date number. All existing code that relies on the numeric argument representing a unixepoch value rather than a julian date value *already* have the 'unixepoch' modifier. **ANALYSIS AND RECOMMENDATION** The addition of the automagical interpretation adds no benefit and will require all pre-existing code that depends on the current behaviour to be modified in order to continue to work as it does currently. If one wishes to break compatibility in this manner then I would recommend that a new modifier be added to indicate that the numeric argument is to be interpreted as 'julianepoch' or some such, and that the function **throws an error** if a numeric is presented and no modifier is specified. This will ensure that the change is **crumbling** and detected and analysed for its effect where it occurs. **A BETTER CHANGE** A pragma setting that has three values for how to interpret numeric arguments with no modifier specified: ``` pragma epoch = JULIAN; -- current state of affairs, default pragma epoch = UNIX; -- force 'unixepoch' unless julian is specified pragma epoch = MAGICAL; -- use "magical interpretation" unless a specific modifier is specified ``` This could, conceivably be extended to allow other epoch boundaries. For example: ``` pragma epoch = JULIAN; -- -4714-11-25 12:00:00 pragma epoch = DOTNET; -- 0001-01-01 00:00:00 pragma epoch = ANSI; -- 1601-01-01 00:00:00 pragma epoch = MJD; -- 1858-11-17 00:00:00 pragma epoch = EXCEL; -- 1900-01-00 00:00:00 pragma epoch = NTP; -- 1900-01-01 00:00:00 pragma epoch = UNIX; -- 1970-01-01 00:00:00 pragma epoch = IBMPC; -- 1980-01-01 00:00:00 pragma epoch = COCOA; -- 2001-01-01 00:00:00 ``` You could even specify the granularity: ``` pragma epochunit = SECOND; pragma epochunit = DAY; ``` The default would be, of course, `epoch = JULIAN` and `epochunit = DAY`, to match the presently existant code.
(8) By anonymous on 2021-11-21 05:06:42 in reply to 7 [link]
Wishful thinking... strftime("%f") microseconds, on windows at least, would also be a really nice addition/option
(15) By Keith Medcalf (kmedcalf) on 2021-11-21 18:25:49 in reply to 8 [link]
There are a number of issues with this. The first is that internally the datetime value is maintained in "milliseconds since the julian epoch" as a 64-bit integer. This means that there is no way to generate output with more resolution than a millisecond. Secondly, even though Windows internally maintains time to the hun (hundreds of nanoseconds) the call used to access that time has an update frequency of a tick (about 16 milliseconds). Although you can change the system call used to get the time so that it obtains the "precise" time rather than the tick time, it will still not be used internally by SQLite to any more than millisecond resolution. Although Unix systems will return the tod with a nanosecond resolution, the granularity is up to the OS, and in any case, only millisecond resolution is used internally.
(9) By Richard Hipp (drh) on 2021-11-21 11:37:29 in reply to 7 [link]
> Numeric arguments are no longer interpreted as Julian Date numbers. No. REAL values are interpreted as Julian Dates and INT values are interpreted as unix timestamps. Do you know of an application that stores an unix timestamp as a REAL or a Julian date as an INT? I do not. So, I'm wondering if the change will have any impact at all. > all pre-existing code that depends on the current behaviour \[must be\] modified The only applications that would need to be modified are those that store Julian Dates as INT, or that store Unix timestamps from the first three months of 1970 as REAL. Do you know of any such applications? I'm guessing that the number of such applications is zero, at least to a good approximation. Do you know of any counter-examples?
(10) By Richard Damon (RichardDamon) on 2021-11-21 12:40:10 in reply to 9 [link]
I know I am currently working on one that does, and I find it hard to imagine that others would not. The key is that Julian date (or minor modifications on it) are a very good compact method to store historical dates. If you are only concerned about DATE, and not the time, then the Julian Day Number as an integer is an ideal storage format, no need to waste 8 bytes when less will do. And if dealing with historical times Julian Dates as floats have problems that there represent the time in GMT, not local, and it is unlikely that you have the time specified as GMT, and if you want to actually DO anything with those values you need to know your local time offset (not just 'time zone, as those didn't exist yet).
(11) By Richard Hipp (drh) on 2021-11-21 12:44:17 in reply to 9 [link]
For that matter, do you know of any application (other than Fossil) that uses Julian dates? Must apps that I have seen use Unix time. They all already use the ‘unixepoch’ modifier and require no changes. I’m interested to hear about any exceptions…
(13) By Richard Damon (RichardDamon) on 2021-11-21 13:22:37 in reply to 11 [link]
Unix time stamps are basically worthless for real HISTORICAL data (more than about a century old). It really depends on what sort of data you are processing.
(12) By jchd (jchd18) on 2021-11-21 12:56:57 in reply to 9 [link]
As others have said, many applications store JDNs (Julian Day Numbers) as INT. In many use cases dates are best expressed in whole days, e.g. accounting. For instance I've almost never seen an invoice with h:m:s timestamps. In history only whole days are significant and you expect sign of friendship on your "integral" birthday no matter the clock hour. While I fully understand the reason for the proposed change, I suspect this would break a significant number of real-world applications. An SQLite JDN is currently Julian date compatible but would no more yield a valid result if the change was silently applied. By "silently" I mean that its highly probable that latest SQLite releases are often used 'as is' (as amalgamation or directly as a downloaded .DLL) without reading the release notes in full. This thanks to the high confidence people have built wrt SQLite code. People know by experience that subsequent SQLite releases are never worse than previous, fixing obscure bugs, improving speed or adding new features. Application code (whatever language they are built with) or libraries often offer routines to deal with JDNs or JDN differences. The perspective of a number of sqlite3.dll hanging around on the average Windows storage for instance, some speaking JDN, some speaking epoch, is quite embarrasing. From my point of view, this would be a breach of contract.
(14.1) By Keith Medcalf (kmedcalf) on 2021-11-22 20:30:35 edited from 14.0 in reply to 9 [link]
> No. REAL values are interpreted as Julian Dates and INT values are interpreted as unix timestamps. Do you know of an application that stores an unix timestamp as a REAL or a Julian date as an INT? I do not. So, I'm wondering if the change will have any impact at all. I do. Why do you assume that unix timestamps are integers? I also sometimes store (when used for other purposes) Julian Day (integer) counts. I have also used Rata Die daystamps (aka the unit is day but the epoch is 0001-01-01 0:00:00.000). Once, oddly enough, I had to use ANSI month stamps (that is, the epoch was the ANSI epoch, 1601, and the unit was month and the fractional part was the fraction of the month elapsed -- that makes for an interesting conversion algorithm...) ``` CREATE TABLE x(x); sqlite> INSERT INTO x VALUES('-4713-11-24 12:00:00.000'); sqlite> INSERT INTO x VALUES('1970-01-01 00:00:00.000'); sqlite> INSERT INTO x VALUES('2021-11-21 17:37:38.000'); sqlite> INSERT INTO x VALUES('9999-12-31 23:59:59.000'); sqlite> select x as timestring, unixtime(x) as UnixEpochSeconds, ulp(unixtime(x)) as ResolutionSecU, julianday(x) as JulianEpochDays, ulp(julianday(x))*86400 ResolutionSecJ from x; ┌──────────────────────────┬──────────────────┬──────────────────────┬──────────────────┬──────────────────────┐ │ timestring │ UnixEpochSeconds │ ResolutionSecU │ JulianEpochDays │ ResolutionSecJ │ ├──────────────────────────┼──────────────────┼──────────────────────┼──────────────────┼──────────────────────┤ │ -4713-11-24 12:00:00.000 │ -210866760000.0 │ 3.0517578125e-05 │ 0.0 │ 9.59232693276135e-12 │ │ 1970-01-01 00:00:00.000 │ 0.0 │ 1.11022302462516e-16 │ 2440587.5 │ 4.02331352233887e-05 │ │ 2021-11-21 17:37:38.000 │ 1637516258.0 │ 2.38418579101563e-07 │ 2459540.23446759 │ 4.02331352233887e-05 │ │ 9999-12-31 23:59:59.000 │ 253402300799.0 │ 3.0517578125e-05 │ 5373484.49998843 │ 8.04662704467773e-05 │ └──────────────────────────┴──────────────────┴──────────────────────┴──────────────────┴──────────────────────┘ sqlite> ``` The resolution of each format (as a double precision floating point number) is at least capable of containing the entire iJD (milliseconds since the julian epoch) to the internal resolution of a millisecond. Assuming that an INTEGER meaning "seconds since the Unix epoch" and REAL meaning "days since the Julian epoch" is a breaking change in that anything that required "seconds since the Unix epoch" already has the appropriate unit (modifier) specified, whereas, there is presently no unit (modifier) for "days since the Julian epoch" as that is the default. This means that all code that requires "days since the julian epoch" will now either (a) require a change to add the appropriate modifier or (b) risk misinterpretation of the value. ** NB: unixtime is a UDF that takes the iJD value from a datetime struct and converts it from days (milliseconds, actually) since the julian epoch to seconds since the unix epoch and returns the full value (a real).
(16) By Richard Hipp (drh) on 2021-11-21 18:55:37 in reply to 14.0 [link]
## Current algorithm: 1. If the timevalue is TEXT → ISO8601 ("YYYY-MM-DD HH:MM:SS.SSS") 2. If the timevalue is numeric and followed by 'unixepoch' → Seconds since 1970-01-01 3. If the timevalue is numeric → Days since -4713-11-24 12:00. ## Proposed New Algorithm (2nd Edition): 1. If the timevalue is TEXT → ISO8601 as before 2. If the timevalue is numeric and followed by 'unixepoch' → Seconds since 1970-01-01 3. If the timevalue is numeric and greater than or equal to 5373484.5 → Seconds since 1970-01-01 4. If the timevalue is numeric → Days since -4713-11-24 12:00. In prose: if the timevalue is numeric, but would not make sense as a Julian day number because it codes for a day beyond 9999-12-31, then interpret the time value as a unix timestamp even if the 'unixepoch' modifier is omitted. Otherwise, everything is as before.
(17) By Keith Medcalf (kmedcalf) on 2021-11-21 19:17:03 in reply to 16 [link]
That should work without causing any breakage. Note that rule 3 could specify that if the timevalue is numeric and "less than 0" or "greater than 5373484.5" then it is seconds since 1970-01-01. I have personally never used (or seen) a JD value less than 0 (though unixepoch values before 1970 (as in less than 0) I have seen and used).
(18) By Richard Damon (RichardDamon) on 2021-11-22 01:41:16 in reply to 16
That shouldn't break any reasonable existing code, it may want a note to explain the 'unusual' conditions. As Kieth mentioned, negative values would likely be safe, as the Epoch base was intentionally chosen so negative Julian Day values are very unlikely as it goes into 'pre-history' so any shuch date would be very artificial. Thinking a bit, it might be useful if there was some way to force a database that is using this new behavior to force older version that don't understand this to reject the database as unfit for that version.
(19) By Simon Slavin (slavin) on 2021-11-22 13:51:08 in reply to 16 [link]
In new step 2 will you be detecting both 'unixepoch' and ' unixepoch' (space character before the word) ? If you're allowing both in a liberal manner, can you allow 'epoch' and ' epoch' too ? But keeping compatibility by allowing just 'unixepoch' is okay too. What happens the value doesn't qualify for any of new steps 1 to 4 ? I suppose that would be either NULL or BLOB but perhaps there are other things. Perhaps the value is TEXT but the first character is not digit, plus, minus, or decimal separator.
(20) By Keith Medcalf (kmedcalf) on 2021-11-22 20:43:12 in reply to 19 [link]
I would presume that a BLOB is "interpreted" as a text string and that if the text string is invalid, then the result is NULL. Similarly, the propagation of NULL requires that a NULL in means NULL out. Any value that is not valid would cause a NULL return. This is current/present behaviour.