SQLite User Forum

Apparent date() bug with out-of-range days in short months
Login

Apparent date() bug with out-of-range days in short months

(1) By George King (gwking) on 2024-09-20 19:26:24 [link] [source]

I just discovered that as of sqlite 3.45.1, date() returns values for nonexistent dates like '2024-04-31', '2024-02-30', '2025-02-29', etc. It seems that SQLite will accept nonexistent dates with a day value between [29,31] even if the month is shorter than that. It then rolls the date over into the following month, thus returning a valid date that is different than the input. This seems inconsistent with the fact that it returns NULL for other nonexistent days, e.g. '2024-01-32'.

I discovered this because I was checking SQLite's handling of leap years but realized it is a more general issue.

It seems a little dangerous to allow this. For example, it means that someone could input a date "2024-04-31" that looks like it is in April, but then silently converts to a date in May. I'm not sure that I could argue it is a security issue, but it does mean that SQLite date() cannot, in a strict sense, be trusted to validate a date.

At the very least, the behavior should probably be documented in https://sqlite.org/lang_datefunc.html.

Thank you!

(2) By jose isaias cabrera (jicman) on 2024-09-20 19:38:49 in reply to 1 [link] [source]

returns values for nonexistent dates like '2024-04-31', '2024-02-30', '2025-02-29', etc.

Do you have a script that clearly shows the problem without any guess work?

(3.4) By Spindrift (spindrift) on 2024-09-20 20:18:09 edited from 3.3 in reply to 1 [source]

The answer here is simple.

Your comment here is entirely correct.

SQLite date() cannot [...] be trusted to validate a date.

Nowhere does it claim to do so.

Sqlite normalises dates where the day of month is potentially valid but ultimately doesn't exist when combined with a particular month and / or year.

Partly this is for purposes like "what is the date one year after 2024-02-29?"

You might find this comment (and possibly the whole of that thread) enlightening.

I would accept that the documentation you reference could include some indication of this behaviour.

If you truly wish sqlite to validate your date for you, just round-trip it and check you get the same date back (you will not in the case that it is invalid).

(4.4) By Aask (AAsk1902) on 2024-09-20 20:33:24 edited from 4.3 in reply to 1 [link] [source]

It seems that SQLite will accept nonexistent dates

The date() function does not validate its argument as a date but a function that returns a date from its argument.

sqlite> with cte(arg) as (values('2024-04-31'),('2024-02-30'),('2025-02-29')) select arg,date(arg) from cte;
arg         date(arg)
----------  ----------
2024-04-31  2024-05-01
2024-02-30  2024-03-01
2025-02-29  2025-03-01

See this thread also.

If you want to validate dates verbatim, consider this method.

sqlite> with cte(arg) as (values('2024-04-31'),('2024-02-30'),('2025-02-29'),('2024-09-20'))   select arg, coalesce(arg = date(arg, '+0 days'), 0)  as isValid from cte;
arg         isValid
----------  -------
2024-04-31  0
2024-02-30  0
2025-02-29  0
2024-09-20  1
sqlite> /* added a valid date for illustration */

(5) By Richard Hipp (drh) on 2024-09-20 20:27:34 in reply to 4.1 [link] [source]

The argument here is that 2024-04-31 and 2024-05-01 are two different ways of writing the same day. You can say that 2024-05-01 is the "preferred" form or the "simplest" form (because it has the smallest day number) and that's fine. The SQLite data/time functions always return the date in the simplest form. But SQLite accepts the date in either format.

See Postel's Law.

By defining dates this way, the definition of a "valid" date is much simpler: YYYY-MM-DD where MM is between 1 and 12 and DD is between 1 and 31. You don't have to do complex Gregorian date computations to figure out if the date is valid or not.

You still have to do the complex computations to determine if the two dates are equivalent or if a date is in its simplest form. But I don't know of a way around that.

(6) By George King (gwking) on 2024-09-20 22:30:52 in reply to 5 [link] [source]

Thank you for the explanation. I don't want to make a bigger deal out of this quirk than it deserves, but think it is worth noting in the documentation.

One interesting description of the behavior is that date() accepts exactly 12 * 31 = 372 'YYY-MM-DD' values but 6 or 7 of them (depending on the year) will be changed to something else.

I understand that the behavior is constrained by backwards compatibility, and I have the utmost respect for the care with which you develop SQLite. I also appreciate that calendars are difficult.

Anyway, I cannot help but wonder what the historical story is that leads to this behavior. Reading between the lines here, it sounds like you chose this simpler validation process in the name of performance. But if the implementation of date does the Gregorian calculation regardless, why accept "04-31"? Because you only want to do said calculation at the end, after processing all of the modifiers?

I guess my ideal outcome here would be that there was some PRAGMA strict_dates = 1, or perhaps a strict_date() function so that SQLite would reject the "alternate spellings" and thus roughly match the behavior of javascript Date.parse(), Python date.fromisoformat(), and Postgres to_date() parsing functions. Then I would just turn it on and never worry about it again.

In any case, thanks again for the explanation!

(7) By Larry Brasfield (larrybr) on 2024-09-21 00:46:56 in reply to 6 [link] [source]

Maybe it would be simpler to allow any integer for a day of a month. The logic would then be consistent, without a cutoff that is wrong for some months and not others. The day of month number Ndom would always refer to Ndom-1 days after the 1st day of the given month. (And, why not allow integers less than 1?) I submit that this would come closer to honoring Postel's law that the present scheme.

(8.1) By jchd (jchd18) on 2024-09-21 23:38:56 edited from 8.0 in reply to 3.4 [link] [source]

If you truly wish sqlite to validate your date for you, just round-trip it and check you get the same date back (you will not in the case that it is invalid).

Not entirely correct, for two distinct reasons: one that can be expected from reading the documentation, the other less intuitive.

The fine doc says (last phrase) that date time functions don't handle leap seconds. So a valid UTC timestamp like '1972-06-30 23:59:60Z' (first leap second added) results in NULL. While it's understandable (but debatable) that leap seconds are not dealt with, the doc explicitely warns us.

A more troublesome issue is when non-existing local timestamps are fed to date functions.

If the current timezone & DST settings of the machine tell us that DST is in force, a full range of local timestamps simply never exist at the moment summer time gets in.

For instance in France timestamp went like this on last Sunday of March at 01:00 UTC this year:

       2024-03-31 01:59:58  <-- UTC+1 winter time
       2024-03-31 01:59:59
                            <-- summer time shift
       2024-03-31 03:00:00  <-- UTC+2 summer time
       2024-03-31 03:00:01

Hence the local time range 02:00:00 - 02:59:59 never occurs, making these timestamps invalid.

Yet, SQLite date functions do not flag them as invalid and proceed with internal adjustments anyway. I don't blame SQLite too much: local time, with all the complexities it brings, is impossible to get correct in the general case (even if you bring in the IANA tzbase), despite being the timescale we all use daily.

Another problem with local time, when DST is used, is at the switch from summer to winter time. A full range of valid timestamps occur then twice in a row!

For instance in France timestamp will go like this on last Sunday of October at 01:00 UTC:

       2024-10-27 01:59:58  <-- UTC+2 summer time
       2024-10-27 01:59:59
       2024-10-27 02:00:00
       2024-10-27 02:00:01
       ...
       2024-10-27 02:59:58
       2024-10-27 02:59:59
                            <--winter time shift
       2024-10-27 02:00:00  again!  <-- UTC+1 winter time
       2024-10-27 02:00:01  again!
       2024-10-27 ...       again!
       2024-10-27 02:59:58  again!
       2024-10-27 02:59:59  again!
       2024-10-27 03:00:00
       2024-10-27 03:00:01

So datetime('2024-10-27 02:00:00', '-1 hour', 'utc') should be undefined (aka NULL) when run on a French machine, because there is no way to decide if that stamp refers to the first occurence or the second.

Local time in the general case, useful as it is, is broken beyond repair. Again SQLite can't solve this.

(9) By DrkShadow (drkshadow) on 2024-09-23 08:09:26 in reply to 5 [link] [source]

The argument here is that 2024-04-31 and 2024-05-01 are two different ways of writing the same day.

Why is the date special? Why not the month? Why isn't '2024-13-02' (2025-01-02) accepted, by the same argument that '2024-02-31' is accepted?

The code is already taking into account years, months, leap-years (leap centuries? 2000?), days in the month -- it's trivial to check that it's a valid day of the month, as it already is checking for a valid month-value. Never, ever was I taught in elementary school that "The 35th day of the month is just four more than the end of last month."

(10) By Adrian Ho (lexfiend) on 2024-10-01 01:29:12 in reply to 9 [link] [source]

Why is the date special? Why not the month? Why isn't '2024-13-02' (2025-01-02) accepted, by the same argument that '2024-02-31' is accepted?

Because some months have 31 days, but no years have 13 months?