How to stores dates for my use case (best query performance).

(1) By CyD (cydside) on 2023-02-07 11:08:26 [link] [source]

Hi to all,
my use case is described in the following points:

Have to store date (no time needed) for events;
Will have many queries for events in the year;
Will have many queries for events in the month;
Will have enough queries for events in a day.
My intention is to store dates as text (instead of real or int) YYYY-MM-DD format and create an index on that field, executing query as
```
"... WHERE date_field LIKE '2023-%';"
```
or
```
"... WHERE date_field LIKE '2023-02-%';"
```
Am I right or in your experience there is a best approach?
I haven't found similar newbie questions in the archive so thanks in advance,
Danilo

(2) By anonymous on 2023-02-07 13:01:12 in reply to 1 [link] [source]

It depends on the size of your data, usually what you are proposing would work fine, but can you store millions of records per day maybe you should consider creating daily summaries of your data to speed up the aggregated queries

(8) By CyD (cydside) on 2023-02-07 19:47:41 in reply to 2 [link] [source]

From one hundred to two hundred thousand of events yearly.

(3) By Ryan Smith (cuz) on 2023-02-07 13:16:54 in reply to 1 [link] [source]

SQLite likes dates in ISO8601 format, so your idea of storing it as 'yyyy-mm-dd' is not only workable, it's also by far the best. SQLite thinks of dates as Numeric values, even if stored in string-like fashion.

As for best querying, the best and fastest way would be to declare the column with the event-date as a NUMERIC column and create an index on it.

Typical ways that you can query would be:

  WHERE event_date >= '2023-02-07' AND event_date < '2023-02-08' -- Single Day
  WHERE event_date >= '2023-02-01' AND event_date < '2023-03-01' -- Single Month
  WHERE event_date >= '2023-01-01' AND event_date < '2024-01-01' -- Single Year
also, if it never contains Time...
  WHERE event_date = '2023-02-07' -- Single Day
  WHERE SUBSTR(event_date,1,8) = '2023-02-' -- Single Month
  WHERE SUBSTR(event_date,1,5) = '2023-'    -- Single Year

and what is useful of the last 3 is that you can also declare partial indexes (based on those same functions) that would make searching very fast. This costs a lot of space in service of more speed.

Another idea is to simply store Year, Month and Day as columns, but it's cumbersome, apart from making simple queries easier, it complicates any other date-arithmetic you may want to do.

Bottom line, it's all real easy, almost any approach will be workable, so decide the best one for your needs.

(4) By Chris Locke (chrisjlocke1) on 2023-02-07 13:38:31 in reply to 3 [link] [source]

Another idea is to simply store Year, Month and Day as columns

Imagining that made my toes curl.

(5) By David Raymond (dvdraymond) on 2023-02-07 13:51:35 in reply to 3 [link] [source]

As for best querying, the best and fastest way would be to declare the column with the event-date as a NUMERIC column and create an index on it.

Out of curiosity, why would you pick NUMERIC rather than TEXT?

All the stored values are going to be text values anyway, and you're using text functions on them. I don't see any reason you'd want to make it NUMERIC unless you're going with julian day or unix epoch. As long as you're going text in ISO-8601, then just call it a TEXT field.

Granted I don't think it'll matter either way, I'm just curious.

(6) By Ryan Smith (cuz) on 2023-02-07 14:07:11 in reply to 5 [link] [source]

Oh I doubt it matters much, it's a good question actually - the only reason I always do it so is this table - See here: datatype3.html.

See the table under 3.1.1 : Affinity Name Examples. In there you can see SQLite itself relays the data-types (if given in a CREATE TABLE for instance) of DATE or DATETIME to NUMERIC, and my (perhaps naive) suspicion has always been that it had a good reason for doing so. Honestly though, I have no real insight and it may well be handled the same as TEXT, I suppose it will take a Dev or some code-scouring to shed light.

(14) By Keith Medcalf (kmedcalf) on 2023-02-07 22:29:42 in reply to 6 [link] [source]

my (perhaps naive) suspicion has always been that it had a good reason for doing so

There is no specific rule that makes 'date' or 'datetime' have numeric affinity.

NUMERIC affinity is simply the fall-though result if none of the other specifically searched for substrings are found.

There is no reason that you cannot declare the type as 'datetime text' or 'date text' which will provide adequate prompts for lay onlookers to know what is contained in the column as well as set the affinity appropriately. I don't think it really matters whether you set the affinity properly or not other than the fact that continually attempting to coerce a text value that continually applying an affinity that can never apply to a value is sure to consume significant resources needlessly.

Note that you can also declare 'julianday real' and 'unixepoch integer' if you want if that is descriptive of what you will be storing in the column.

That is the purpose of a type declaration after all -- to declare the type -- primaily for bags-of-mostly-water looking at the code somewhere down the road.

(13) By Simon Slavin (slavin) on 2023-02-07 21:45:39 in reply to 5 [source]

In some applications you need to count days between two dates. Or do some other maths which involves a number of days. There are also cases where you need to sort many (millions of) events into date order and, of course, sorting numbers is faster than sorting text.

For those applications it can be useful to stores dates as numbers. In most other cases, storing dates as text is superior solution.

(15) By Tim Streater (Clothears) on 2023-02-07 22:49:08 in reply to 13 [link] [source]

It seems to me you store it according to what is best for the usage. I store date/time as seconds since the epoch, so that storage is reduced (a little), and sorting is made easier. And this is the data storage layer. For presentation to the user, conversion is then done according to the user's preferences: with/without am/pm, 12/24 hour clock, y/m/d, d/m/y, m/d/y according to taste. Presentation should not be constrained by the storage format.

(9.1) By CyD (cydside) on 2023-02-07 20:11:50 edited from 9.0 in reply to 3 [link] [source]

Thanks to your code I ended to find out that using "LIKE" statement will NOT benefit of the index as "BETWEEN" and "=" do. Really appreciated.

(10) By Keith Medcalf (kmedcalf) on 2023-02-07 20:26:42 in reply to 9.1 [link] [source]

By default, like is case insensitive so you need to declare your table column as collate nocase, or use collate nocase on the date field in any indexes, in order for the index to be useful. That is:

create table t
(
  date text collate nocase
);
create index i on t(date);

create table t
(
  date text
);
create index i on t(date collate nocase);

Note that if you use the latter rather than the former the following will not use the index for sorting:

select date from t order by date;

because the index is on date collate nocase, not on date.

(17) By CyD (cydside) on 2023-02-10 09:07:10 in reply to 10 [link] [source]

Lesson learned, many thanks.

(7.3) By Keith Medcalf (kmedcalf) on 2023-02-07 20:44:27 edited from 7.2 in reply to 1 [link] [source]

2. Will have many queries for events in the year;
3. Will have many queries for events in the month;
4. Will have enough queries for events in a day.

Did you misstate your requirement in 3 and 4?

Do you mean to have said perhaps "Will have many queries for events in a month of a year; and Will have many queries for events in a day of a month of a year?"

These are quite different things than what you said, which is that you will have many queries for events in a month (irrespective of the year and the day) or the day (irrespective of the year and the month).

If if fact you will be searching for year, year-month, and year-month-day (and not month only and day only), then a simple prefix search will work.

You should not use LIKE as it will introduce extra complication and overhead that is unnecessary and extremenly costly (and a great likelyhood that you will mess it up). LIKE is case insensitive by default and requires that you declare the column to be case-insensitive, or at least that the index on the column is declared to be case insensitive. You should forget about like and just simply do an index search:

create table t
(
-- ... a bunch of columns
   date  text check (date(date, '+0 days') == date),
-- ... a bunch more columns
);
create index t_date on date(date);

Then you can simply use the following where clauses:

-- year
where date between '2019' and '2019x'
-- month of year
where date between '2019-03'  and '2019-03x'
-- day of month of year
where date between '2019-03-14' and '2019=03-14x'
-- or just
where date = '2019-03-14'

or, generically,

where date between ?1 and ?1 || 'x'

to search for everything in a year, month of a year, or day of a month of a year. The check constraint on the table ensures that you only put valid dates (or null) in the date field. You can use whatever character your heart desires for the 'x', provided that it sorts greater than '-'.

Note that the syntax is exactly the same and you merely provide the appropriate year, year-month, or year-month-day in a binding to the statement. Could not be more simple, really.

If you have to search for a month (irrespective of the year) or a day (irrespective of the month and year) then you should store your date in three columns, year, month, and day, so that you can specify year = ... and month = ... and day = ... in the where clause. For example:

create table t
(
  year  integer,
  month integer,
  day   integer,
  date  as (format('%04d-%02d-%02d', year, month, day)) stored,
  check (date(date, '+0 days') == date)
);
create index t_ymd on t(year, month, day);
create index t_md on t(month, day);
create index t_d on t(day);
create index t_date on t(date);

This would let you have your cake and eat it too.

Note that this alternate table definition would work just as well:

create table t
(
   date     text check (date(date, '+0 days') == date),
   year     integer as (strftime('%Y', date)),
   month    integer as (strftime('%m', date)),
   day      integer as (strftime('%d', date))
);
create index t_ymd on t(year, month, day);
create index t_md on t(month, day);
create index t_d on t(day);
create index t_date on t(date);

depending on whether you want to insert year / month / day as individual items or (in the latter case) only insert a date string.

Updated because the computed column date should be stored to avoid repetitive computation since it is likely accessed in every query. year/month/day computed columns are virtual since they are only needed when accessed which will, rpresumably, be seldom. If year/month/day are needed often, then they should be stored to avoid unnecessary recomputation

(11) By CyD (cydside) on 2023-02-07 20:27:40 in reply to 7.2 [link] [source]

Yes, I do mean year, year-month, and year-month-day.
Thanks a lot for your valuable answer.

(12) By anonymous on 2023-02-07 20:37:05 in reply to 1 [link] [source]

The most efficient is to store the date as a number, like the number of seconds since 1970, then create an index on it. This is called an unix timestamp - the number of seconds since 1970-01-01 00:00:00 UTC. And it is one of the standard ways of storing date/time in the DB. You could just ignore the time, but if you need it someday, you will have it at hand. https://www.sqlite.org/lang_datefunc.html

For queries, you just search in the index for the range, using a BETWEEN or >= <= where clause.

It will be faster than using ISO-8601 (no need to parse nor store any text, just a number), and use less storage space. And you could use your language standard library to convert it back/from text, if needed, since Unix timestamp is a well known standard.

(16) By CyD (cydside) on 2023-02-10 08:59:38 in reply to 12 [link] [source]

Sorry for the late answer. Interesting solution, I'll test it, thanks.