SQLite Forum

Count distinct and window function
Login

Count distinct and window function

(1) By anonymous on 2021-09-18 23:29:34 [source]

Help with window function

I am trying to count the number of distinct patients so far for any given day.

Here is a simplified example of what I am trying to achieve:

create table visit(datestring,timestring,patient);

insert into visit(datestring,timestring,patient) values ('2021-09-01','09:00','A');

insert into visit(datestring,timestring,patient) values ('2021-09-01','12:00','A');

insert into visit(datestring,timestring,patient) values ('2021-09-01','10:00','B');

insert into visit(datestring,timestring,patient) values ('2021-09-01','11:00','B');

insert into visit(datestring,timestring,patient) values ('2021-09-02','09:00','D');

insert into visit(datestring,timestring,patient) values ('2021-09-02','12:00','C');

insert into visit(datestring,timestring,patient) values ('2021-09-02','10:00','B');

insert into visit(datestring,timestring,patient) values ('2021-09-02','13:00','B');

which gives:

select * from visit order by datestring,timestring;

datestring timestring patient
2021-09-01 09:00 A
2021-09-01 10:00 B
2021-09-01 11:00 B
2021-09-01 12:00 A
2021-09-02 09:00 D
2021-09-02 10:00 B
2021-09-02 12:00 C
2021-09-02 13:00 B

The closest I could go to the expected result is here:

select datestring,timestring,patient,(case when count() over (partition by datestring,patient order by timestring rows between unbounded preceding and current row) > 1 then 0 else 1 end) as distinct_count from visit order by datestring,timestring;

which gives:

datestring timestring patient distinct_count
2021-09-01 09:00 A 1
2021-09-01 10:00 B 1
2021-09-01 11:00 B 0
2021-09-01 12:00 A 0
2021-09-02 09:00 D 1
2021-09-02 10:00 B 1
2021-09-02 12:00 C 1
2021-09-02 13:00 B 0

How do I print the sum of the distinct_count over all the preceding rows which have the same datestring (including the current row)? It would this:

datestring timestring patient distinct_count what_I_want
2021-09-01 09:00 A 1 1
2021-09-01 10:00 B 1 2
2021-09-01 11:00 B 0 2
2021-09-01 12:00 A 0 2
2021-09-02 09:00 D 1 1
2021-09-02 10:00 B 1 2
2021-09-02 12:00 C 1 3
2021-09-02 13:00 B 0 3

I have spent a lot of time trying to figure this out. Any help or pointer would be greatly appreciated. Thank you.