Getting a date range using "like"

(1) By Phil Endecott (endecotp) on 2021-08-08 13:17:55 [link] [source]

Dear Experts,

I have a table with a timestamp column, and I'd like to query it for date ranges such as "all of 2010", "all of May 2020", "25th December 1991" etc. i.e. things that are prefixes of the date-time when expressed as a string. So naively I thought I would try to use "like" patterns:

timestamp like "2010-%"
timestamp like "2020-05-%"
timestamp like "1991-12-25%"

That gives the right answers. But I think it's always doing sequential scans. Is there any way that it can do this using an index? In principle a "like" query that just tests a prefix should be implementable using an index, shouldn't it?

An index is used if I rewrite the query as e.g. timestamp >= '2020-05' and timestamp < '2020-06', but forming that sort of query is a bit tedious in general due to the complexities of dates.

How would you do this?

(2) By TripeHound on 2021-08-08 15:30:46 in reply to 1 [link] [source]

Can't test at the moment, but I seem to remember that LIKE uses a NOCASE search, and a default index is case sensitive. You might try making the timestamp column, or the index on it, NOCASE.

(3) By TripeHound on 2021-08-08 20:38:30 in reply to 1 [source]

Can confirm my hunch. Using default collation for field and index:

create table t ( timestamp text, data ) ;
insert into t values ( '2021-08-02', 'date1' ), ( '2021-08-23', 'date2' ), ( '2021-09-01', 'date3' ) ;
create index idx on t( timestamp ) ;
.eqp on

produces a table-scan:

-- Using default index
select * from t where timestamp like '2021-08-%' ;
QUERY PLAN
`--SCAN TABLE t
2021-08-02|date1
2021-08-23|date2

Replacing the index with a COLLATE NOCASE one:

drop index idx ;
create index idx on t( timestamp collate nocase ) ;

uses the index:

-- Using NOCASE index
select * from t where timestamp like '2021-08-%' ;
QUERY PLAN
`--SEARCH TABLE t USING INDEX idx (timestamp>? AND timestamp<?)
2021-08-02|date1
2021-08-23|date2

Alternatively, creating the original field as COLLATE NOCASE and using a default index:

drop table t ;
create table t ( timestamp text collate nocase, data ) ;
insert into t values ( '2021-08-02', 'date1' ), ( '2021-08-23', 'date2' ), ( '2021-09-01', 'date3' ) ;
QUERY PLAN
`--SCAN 3 CONSTANT ROWS
create index idx on t( timestamp ) ;

also uses the index:

-- Using NOCASE collation
select * from t where timestamp like '2021-08-%' ;
QUERY PLAN
`--SEARCH TABLE t USING INDEX idx (timestamp>? AND timestamp<?)
2021-08-02|date1
2021-08-23|date2

(4) By anonymous on 2021-08-08 22:53:26 in reply to 3 [link] [source]

This should do the wanted thing with a case sensitive column:

    timestamp GLOB '2010-*'

etc.

(5) By Phil Endecott (endecotp) on 2021-08-09 12:14:33 in reply to 1 [link] [source]

Thanks everyone. I have this working now. I was unaware that like is case-insensitive; in PostgreSQL there is a separate "ilike".

As a further complication, I just realised that I had created the column with type "datetime" - which gives it integer affinity. I've now changed it to text.

(6) By Keith Medcalf (kmedcalf) on 2021-08-09 22:36:38 in reply to 5 [link] [source]

A declared type of datetime gives that attribute NUMERIC affinity, not INTEGER affinity.

See https://sqlite.org/datatype3.html in particular section 3.1 and the last row in the table at 3.1.1

(7) By Ryan Smith (cuz) on 2021-08-09 23:22:07 in reply to 5 [link] [source]

What Keith said, plus the fact that NUMERIC is perfect for a column with date and time values and should work perfectly for your case.

Also, in SQLite there is a pragma "case_sensitive_like" if you fancy changing that behaviour.