SQLite Forum

How to convert to CTE
Login

How to convert to CTE

(1) By RBS (888888) on 2021-04-22 20:08:36 [link] [source]

Have this SQL, which counts the number of registered patients at a given date. These are integer Excel dates, so where 1/1/1900 is 0:

select count(*) as Cnt from (select id, max(date_status_added) as last_date from reg_history_all where not id in(select id from reg_history_all where reg_status = 'Deceased') and date_status_added < 43000 group by id having reg_status = 'Registered')

Now I would like to do this for all dates between say 1000 and 43000, getting in the output all these dates and counts.

RBS

(2) By Keith Medcalf (kmedcalf) on 2021-04-22 20:30:04 in reply to 1 [link] [source]

Do you mean:

  select last_date, 
         count(distinct id)
    from (
            select id, 
                   max(date_status_added) as last_date 
              from reg_history_all 
             where not id in (
                              select id 
                                from reg_history_all 
                               where reg_status = 'Deceased'
                             ) 
               and date_status_added < 43000 
          group by id 
            having reg_status = 'Registered'
         )
group by last_date
;

(3) By anonymous on 2021-04-22 20:50:39 in reply to 2 [link] [source]

Not sure how this query works, but it gives the wrong values. They are much too low and should slowly go up by date. My query gives the right value.

RBS

(5) By anonymous on 2021-04-22 21:09:59 in reply to 2 [link] [source]

This is a small sample of the data:

ID,DATE_STATUS_ADDED,REG_STATUS 1,21020,Registered 1,21020,Registered 1,37291,Deceased 2,21307,Registered 2,21307,Registered 3,21307,Registered 3,21307,Registered 4,28457,Registered 4,28457,Registered 4,38615,Deceased

RBS

(17) By Ryan Smith (cuz) on 2021-04-23 10:30:58 in reply to 5 [link] [source]

Is this a sample of the data in your table or a sample of the output you want?

If it is the output you want, that is impossible to do in any SQL. The first two lines are exact duplicates of each other, which means the grouping identifier does not exist in the output. Impossible to do, or at least impossible to deduce the question (query) that caused this output.

If it is the actual data, then it at least isn't impossible, but it is much worse. Should those first two lines be added together? I mean they are both "Registered" and both have an ID of 1, and also both have 21020 Registrations, which feels wrong to me.

You should not assume that we know anything much about SQL, let alone your specific data or use case. Please explain it really good cause we are a simple people.

Show some of the data from the actual data base, then some sample of what your output should look like, and please make sure it is correct, else it is like asking us to build your jigsaw puzzle of a parrot by showing us a guide-image of a monkey.

Also, there is a nice feature in Markdown where you add tabular text like that output from the last post, inside block quotes using, for example, three backticks, like this:
```
then all your text you want to display, and again three backticks
```
at the end. That will format it like blocked text (verbatim monospace).

(4.1) By Ryan Smith (cuz) on 2021-04-22 21:14:48 edited from 4.0 in reply to 1 [link] [source]

When you say "for all dates between 1000 and 43000", do you mean for all days that have data, or for all dates that exist, regardless of having data on that day? This would be the requirement if one wants to draw a graph or such over time - and indeed the only reason a CTE would be needed.

I think Keith did a great job of showing how the grouped query should work, so with hopefully his blessing I'm going to simply plagiarize that into a CTE query to make that all-dates variant.

I'm simply making the first CTE table be what Keith already showed, the second CTE table builds a list of all dates, and the final select JOINs these together to fill the values as needed.

[EDIT: Fixed the Query by adding the very needed LEFT JOIN that Keith's astute eye thankfully caught!]

WITH date_ids(last_date, id_count) AS (   -- Keith's magic
  select last_date, 
         count(distinct id)
    from (
            select id, 
                   max(date_status_added) as last_date 
              from reg_history_all 
             where not id in (
                              select id 
                                from reg_history_all 
                               where reg_status = 'Deceased'
                             ) 
               and date_status_added < 43000 
          group by id 
            having reg_status = 'Registered'
         )
group by last_date
), all_dates(day_date) AS (   -- The full dates list from 1000 to 43,000
  select 1000
  union all
  select day_date + 1 from all_dates where day_date < 43000
)
SELECT day_date, id_count
  FROM all_dates
  LEFT JOIN date_ids ON last_date = day_date
;

Couldn't test it, so syntax errors may exist, but it should work in principle.

You can even add calendar dates by figuring out the Julianday number for 1900-01-01 and add that to the day_date field and use the date/time functions in SQLite to display it any way you like.

HTH

(6) By Keith Medcalf (kmedcalf) on 2021-04-22 21:10:45 in reply to 4.0 [link] [source]

That should be a LEFT JOIN if you want to include all dates :)

(7) By Ryan Smith (cuz) on 2021-04-22 21:20:17 in reply to 6 [link] [source]

When you concentrate so hard getting the syntax right you completely miss the glaring elephant of a mistake in the main premise.

Corrected now - Thanks for spotting it!

(9) By anonymous on 2021-04-22 21:34:19 in reply to 7 [link] [source]

To make it a bit clearer this would be the SQL for just one date:

select xsd2xld(22/04/2021) as Cnt_Date, count(*) as Cnt from (select id, max(date_status_added) as last_date from reg_history_all where not id in(select id from reg_history_all where reg_status = 'Deceased') and date_status_added < xsd2xld(22/04/2021) group by id having reg_status = 'Registered')

xsd2xld is just a function converting to the Excel date.

RBS

(18.1) By Ryan Smith (cuz) on 2021-04-23 11:22:32 edited from 18.0 in reply to 9 [link] [source]

Ok, now answer me this: Do these two queries below give the same result?
Because according to SQL rules they must (unless I am missing something), so if they don't we need to really see your data and Schema:

Your original:

SELECT xsd2xld(22/04/2021) as Cnt_Date, count(*) as Cnt
  FROM (
        SELECT id, max(date_status_added) as last_date
          FROM reg_history_all
         WHERE NOT id IN (
               SELECT id
	         FROM reg_history_all
                WHERE reg_status = 'Deceased'
             ) AND date_status_added < xsd2xld(22/04/2021)
         GROUP BY id
	 HAVING reg_status = 'Registered'
       )
;

vs. This simplified version:

SELECT MAX(date_status_added) as Cnt_Date, COUNT(*) AS Cnt,
  FROM reg_history_all
 WHERE reg_status = 'Registered' AND date_status_added < xsd2xld(22/04/2021)
;

EDIT: Perhaps I should explain why the above two queries should be equivalent:

In the first query, you essentially ask in the inner query to first give you a list of all the records, except the records where the status is "Deceased". Once you have that list, Group by id (which does nothing other than allow a HAVING clause) and then using the having clause only give the records where the status is "Registered".

This is very much like me saying to you: Give me a list of all cars on earth, but exclude any Toyotas. Finally, from that, show me only the Fords. There was no need to exclude the Toyotas... you could have just directly asked for the Fords, which already excludes all Toyotas. In this way, asking for "Registered" statuses in the latter part of the query already excludes any "Deceased" statuses, there really is no need to first exclude them specifically.

Further to that, the internal GROUP BY is completely irrelevant as it groups on a Unique value, and even if it isn't a unique value, the implied grouping in the outer query "SELECT xsd2xld(22/04/2021) as Cnt_Date, count(*) as Cnt" actually groups by the real thing you wish to group by, and that is the entire set.

So please test the two queries, and if they actually give different results, there is a vast amount of information we are missing about your schema.

(19) By anonymous on 2021-04-23 18:30:37 in reply to 18.1 [source]

Those 2 queries give very different results and your simplified version gives the wrong result. This is small sample of all the columns of the data:

ID DATE_STATUS_ADDED LINKS_REG_STATUS REG_STATUS PATIENT_TYPE DATE_STATUS_ENDED 239 23527 Application Form FP1 submitted Registered Regular 23527 239 23527 Notification of registration Registered Regular 36549 239 36549 Death Deceased Regular 0 246 32212 Application Form FP1 submitted Registered Regular 32212 246 32212 Notification of registration Registered Regular 39217 246 39217 Internal transfer - address change Left Regular 0 7340 31937 Application Form FP1 submitted Registered Regular 31937 7340 31937 Notification of registration Registered Regular 42451 7340 42451 Other reason Left Regular 42451 7340 42451 Record Requested by FHSA Left Regular 42453 7340 42453 Patient has presented Registered Regular 42453 7340 42453 Application Form FP1 submitted Registered Regular 42453 7340 42453 Record Received Registered Regular 42458 7340 42458 Notification of registration Registered Regular 42467 7340 42467 Patient has presented Registered Regular 42467 7340 42467 Record Received Registered Regular 42467 7340 42467 Application Form FP1 submitted Registered Regular 42482 7340 42467 Other reason Left Regular 42467 7340 42482 Notification of registration Registered Regular 44200 7340 44200 Death Deceased Regular 44202 7340 44202 Record Requested by FHSA Left Regular 44228 7340 44228 Records sent back to FHSA Left Regular 0 19375 35403 Application Form FP1 submitted Registered Regular 35403 19375 35403 Notification of registration Registered Regular 35403 19375 35403 Death Deceased Regular 0

One complication is that there are multiple entries on the same date, but with completely different meanings. I have solved this now by copying the sorted data to a new table:

insert into REG_HISTORY_ALL2 select ID, DATE_STATUS_ADDED, LINKS_REG_STATUS, REG_STATUS, PATIENT_TYPE, DATE_STATUS_ENDED from REG_HISTORY_ALL order by id asc, DATE_STATUS_ADDED asc, REG_STATUS desc

This simplifies my single date query:

select xsd2xld(23/04/2021) as Date, count(*) as Cnt from (select id, max(rowid) as last_date from reg_history_all2 where date_status_added < xsd2xld(23/04/2021) group by id having reg_status = 'Registered')

As now the row with 'Deceased' will be selected after the row with 'Registered if they have the same date_status_added.

RBS

(21) By anonymous on 2021-04-24 04:53:33 in reply to 19 [link] [source]

With the reordered table after doing:

insert into reg_history_all2 select id, date_status_added, links_reg_status, reg_status, patient_type, date_status_ended from reg_history_all order by id asc, date_status_added asc, reg_status desc

the explanation of what the SQL needs to do to count the registered patients at a given date (call this dateX) is simple:

  1. Get the rows with max(rowid) for a given ID (so grouped by ID)
  2. Loop through these rows
  3. if the value of date_status_added is <= dateX then
  4. if the text of reg_status = 'Registered' then add 1 to the count

Something like this will do that (where dateX = 32335):

select 32335, count(*) as cnt from (select date_status_added from reg_history_all where rowid in(select last_rowid from( select id, max(rowid) as last_rowid from reg_history_all where date_status_added <= 32335 group by id having reg_status = 'Registered')))

I have tested this on various dates and this gives the right counts. Just need to figure out now how to change this to a CTE, given all dates and all counts. It doesn't actually need to be all dates, only the unique dates of date_status_added will be fine.

RBS

(22) By anonymous on 2021-04-24 05:15:24 in reply to 21 [link] [source]

To make this SQL a bit neater and faster:

select dateX, count(*) as cnt from (select 1 from reg_history_all where rowid in (select last_rowid from (select id, max(rowid) as last_rowid from reg_history_all where date_status_added <= dateX group by id having reg_status = 'Registered')))

I guess the simplest and fastest way to approach this is to make a table holding only all the final date_status_added dates (final per ID) where reg_status = 'Registered' and run the CTE on that. Will look at that later.

RBS

(24) By anonymous on 2021-04-24 10:01:39 in reply to 21 [link] [source]

How would I turn this SQL into a CTE where the Excel date (here 35100) goes from 8623 to 44310?

select 35100 as last_date, count(*) as cnt from (select 1 from reg_history_all where rowid in(select last_rowid from( select id, max(rowid) as last_rowid from reg_history_all where date_status_added <= 35100 group by id having reg_status = 'Registered')))

RBS

(25.1) By Ryan Smith (cuz) on 2021-04-24 13:43:47 edited from 25.0 in reply to 19 [link] [source]

Ok, this is starting to form a clearer picture - and it means my previous assumption was wrong about id being in any way unique (also pointed out by Keith in another post).

Now your further statements say that this is what needs to be achieved:

  • Get the rows with max(rowid) for a given ID (so grouped by ID)
  • Loop through these rows
  • if the value of date_status_added is <= dateX then
  • if the text of reg_status = 'Registered' then add 1 to the count

This still doesn't answer the question though, it pre-supposes the solution to a non-stated question.

Let me try to state a question I think might be what you want, then you say if it is correct or needs to be adjusted:

"Give a list of the cumulative registration counts per day for any registration type of all people who didn't die at some point".

Is this accurate?

If not, please give the change to that statement that will work. If that is correct, then this below query works on the example data you've given, though it doesn't yield pretty results hard to be sure, all but 1 entry has a deceased ID in the example.

WITH non_deceased_registrations(added_date, added_count) AS (
    SELECT A.DATE_STATUS_ADDED, COUNT(*)
      FROM reg_history A
     WHERE A.REG_STATUS = 'Registered' AND NOT EXISTS(SELECT 1 FROM reg_history B WHERE B.ID = A.ID AND B.REG_STATUS = 'Deceased')
     GROUP BY A.DATE_STATUS_ADDED
     ORDER BY A.DATE_STATUS_ADDED
), all_dates(day_date) AS (
    SELECT 10000 UNION ALL SELECT day_date+1 FROM all_dates WHERE day_date<45000
), cumulative_registrations(day_date, added_total_to_date) AS (
    SELECT day_date, SUM(added_count) OVER (ORDER BY day_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
      FROM all_dates
      LEFT JOIN non_deceased_registrations ON added_date = day_date
)
SELECT day_date, COALESCE(added_total_to_date,0) AS added_total_to_date
  FROM cumulative_registrations

Feel free to adjust those start and end dates (10000 & 45000) to suit your needs or use your excel date convert to state them, I don't have it so couldn't test with it.

Note that this gives a full range of dates between 10000 and 45000 which is a lively 35K of rows to deal with.

Also note, this gives the total of ALL registrations, be it "Notification of Registration" or "Application Form FP1 submitted" or Record Receive, etc. etc. Are you sure that is wanted? I would think we only want to count something like "Notification of Registration" - but then that has to be stated in the question.

Lastly - Your question about "This works, just show me how to put this query in a CTE" is not viable, that query will run forever in a CTE for even a mediocre date-range. The above query will already take some time, but will at least finish in human time scales.

(26) By Ryan Smith (cuz) on 2021-04-24 13:57:49 in reply to 25.1 [link] [source]

This next version works if you need the latest of the registrations per ID only, but for any type of registration - which seems implied in some previous posts:

WITH latest_registrations(added_date, ID) AS (
    SELECT MAX(A.DATE_STATUS_ADDED), ID
      FROM reg_history A
     WHERE A.REG_STATUS = 'Registered' AND NOT EXISTS(SELECT 1 FROM reg_history B WHERE B.ID = A.ID AND B.REG_STATUS = 'Deceased')
     GROUP BY A.ID
     ORDER BY 2,1
), non_deceased_registrations(added_date, added_count) AS (
    SELECT added_date, COUNT(*)
      FROM latest_registrations
     GROUP BY added_date
     ORDER BY added_date
), all_dates(day_date) AS (
    SELECT 10000 UNION ALL SELECT day_date+1 FROM all_dates WHERE day_date<44300
), cumulative_registrations(day_date, added_total_to_date) AS (
    SELECT day_date, SUM(added_count) OVER (ORDER BY day_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
      FROM all_dates
      LEFT JOIN non_deceased_registrations ON added_date = day_date
)
SELECT day_date, COALESCE(added_total_to_date,0) AS added_total_to_date
  FROM cumulative_registrations

and this version works if you want the registration of specifically "Notification of registration" type registrations:

WITH non_deceased_registrations(added_date, added_count) AS (
    SELECT A.DATE_STATUS_ADDED, COUNT(*)
      FROM reg_history A
     WHERE A.REG_STATUS = 'Registered' AND A.LINKS_REG_STATUS = 'Notification of registration'
       AND NOT EXISTS(SELECT 1 FROM reg_history B WHERE B.ID = A.ID AND B.REG_STATUS = 'Deceased')
     GROUP BY A.DATE_STATUS_ADDED
     ORDER BY A.DATE_STATUS_ADDED
), all_dates(day_date) AS (
    SELECT 10000 UNION ALL SELECT day_date+1 FROM all_dates WHERE day_date<44300
), cumulative_registrations(day_date, added_total_to_date) AS (
    SELECT day_date, SUM(added_count) OVER (ORDER BY day_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
      FROM all_dates
      LEFT JOIN non_deceased_registrations ON added_date = day_date
)
SELECT day_date, COALESCE(added_total_to_date,0) AS added_total_to_date
  FROM cumulative_registrations

(27) By anonymous on 2021-04-24 15:50:53 in reply to 26 [link] [source]

Thanks for that. It runs and it is surprisingly fast, but the results are wrong. Very complex query, but noticed you use: MAX(A.DATE_STATUS_ADDED) and not sure that can work as there are duplicate dates for the same ID. I think you may need better (more) sample data to test. Can I post a .csv file as an attachment?

RBS

(28) By Larry Brasfield (larrybr) on 2021-04-24 16:05:03 in reply to 27 [link] [source]

Attachments are not supported in this forum.

(29) By anonymous on 2021-04-24 16:06:20 in reply to 28 [link] [source]

Ok, thanks.

RBS

(30) By anonymous on 2021-04-24 16:28:56 in reply to 26 [link] [source]

This is getting a lot closer:

WITH latest_registrations(added_date, ID) AS( SELECT MAX(A.rowid), ID FROM reg_history_all A WHERE A.REG_STATUS = 'Registered' GROUP BY A.ID ORDER BY 2,1 ), non_deceased_registrations(added_date, added_count) AS( SELECT added_date, COUNT(*) FROM latest_registrations GROUP BY added_date ORDER BY added_date ), all_dates(day_date) AS( SELECT 100 UNION ALL SELECT day_date + 1 FROM all_dates WHERE day_date < 44300 ), cumulative_registrations(day_date, cnt) AS( SELECT day_date, SUM(added_count) OVER(ORDER BY day_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM all_dates LEFT JOIN non_deceased_registrations ON added_date = day_date ) SELECT day_date, COALESCE(cnt,0) AS cnt FROM cumulative_registrations

I noticed counts seem only to go up by date, but that can't be right as the number of registered patients can go down (and do go down) as patients die or leave.

RBS

(31) By Ryan Smith (cuz) on 2021-04-25 03:37:54 in reply to 30 [link] [source]

I noticed counts seem only to go up by date

Shouldn't it? I thought you were interested in the cumulative totals of registered people over time. That's why we ask about stating your question in English - it is not obvious to me that the totals shouldn't only go up.

but that can't be right as the number of registered patients can go down (and do go down) as patients die or leave.

So what is it that you want to see for each day then? Only the people that are currently registered when that day rolls by? for every day? I can do that too, but your query that you gave originally wouldn't do that correctly.

Would the question statement then become: Give a list of the total number of people that are registered at the time (having already registered, but not yet left or died) for each day along the date range?

You keep giving feedback about how "close" the query is or what doesn't seem "right" with it, in stead of just saying what it is you want. we keep trying to guess what you want and offering possible question statements trying to define what you need from the query, and you keep ignoring it. Pretty please, with sugar on top, tell me what is wrong with the question statement I gave above, or put what you want in your own words in simple English so that there can be no doubt what you wish to see from that query. You obviously have a clear idea of what you want, if only you can explain that idea fully to us. Explain it like I'm 5 years old. :)

Perhaps you assume the query needs to be advanced or difficult - let me say that making the query is extremely easy once you can state your question in plain English. Understanding what you really want from the query is the only difficult part.

Also, it's really hard to test without some proper data, so that plan to upload some data is a good idea. Perhaps add it to a file host site and post the link? Box or Drop-box or such will do.

(32.5) By Keith Medcalf (kmedcalf) on 2021-04-25 07:15:01 edited from 32.4 in reply to 31 [link] [source]

with reg1(id, date, status, pstatus)
  as (
          select id,
                 date_status_added,
                 reg_status,
                 lag(reg_status, 1) over (partition by id order by date_status_added, reg_status desc)
            from reg_history_all
           where reg_status in ('Registered', 'Deceased')
        order by id, date_status_added, reg_status desc
     ),
     reg2(id, date, cnt)
  as (
        select id,
               date,
               iif(status == 'Registered', 1, iif(pstatus == 'Registered', -1, 0))
          from reg1
         where status IS NOT pstatus
     ),
     reg3(date, cnt)
  as (
          select date,
                 sum(cnt)
            from reg2
        group by date
        order by date
     )
  select date,
         sum(cnt) over (rows between unbounded preceding and current row)
    from reg3
order by date
;

Edited for correctness

(33) By anonymous on 2021-04-25 10:23:09 in reply to 32.5 [link] [source]

Thanks for that, this works nicely indeed. I had to make 2 changes: 1. I am coding for Android and the current SQLite (SQLiteCipher) version (3.25.2) doesn't support Iif. 2. You left out reg_status 'Left.


with reg1(id, date, status, pstatus) as ( select id, date_status_added, reg_status, lag(reg_status, 1) over (partition by id order by date_status_added, reg_status desc) from reg_history_all where reg_status in ('Registered', 'Deceased', 'Left') order by id, date_status_added, reg_status desc ), reg2(id, date, cnt) as ( select id, date, case when status == 'Registered' then 1 else case when pstatus == 'Registered' then -1 else 0 end end from reg1 where status IS NOT pstatus ), reg3(date, cnt) as ( select date, sum(cnt) from reg2 group by date order by date ) select date, sum(cnt) over (rows between unbounded preceding and current row) as Cnt from reg3 order by date

I compared the output of your SQL with my earlier single date query, run in a loop and results are the same.

RBS

(34) By Keith Medcalf (kmedcalf) on 2021-04-25 17:22:31 in reply to 33 [link] [source]

Excellent -- though I did not see any mention of a status of "Left" in any previous posting other than in anecdotal comments about "Leaving".

So reg1 takes the source data and labels the pstatus (previous reg_status) for each (first record for an id has no prior reg_status). The ordering is by reg_status desc so that Deceased appears after Registered if someone has both on the same date.

Reg2 then only considers status changes (status IS NOT pstatus (note that pstatus can be NULL if the id had no prior status)). If the status change is to 'Registered' then the cnt is +1 and if the pstatus was 'Registered' but is now something else (like 'Deceased') then the count is '-1'. The nesting is to allow reg_status other than 'Deceased' to be included in reg1 (as you have done) and tracks only status == 'Registered'.

Reg3 computes the total cnt change for each date.

The final query computes the running total of each days count for each date.

So the correct initial question would be to have said:

I have a table reg_history_all containing columns id, date_status_added, reg_status. reg_status indicates the registration status at the time the record was added. There may be multiple entries per id per date. I am interested in the running count of unique id's with status 'Registered' on each date. A registration is terminated by a reg_status of 'Deceased' or 'Left'. There may be other status that need to be removed from the data. There is no guarantee that an id was "Registered" prior to "Deceased" or "Left" and no guarantee that they were not transitioned through all three states on one day.

(35) By anonymous on 2021-04-25 17:57:31 in reply to 34 [link] [source]

Very nice indeed as it is very fast as well. I am not familiar with CTE's and Window functions, but will study this query. Yes, my initial question should have been clearer, but I think post 21 explained the requirements well enough. Thanks again for solving this.

RBS

(36) By Keith Medcalf (kmedcalf) on 2021-04-25 19:35:40 in reply to 35 [link] [source]

Your welcome, however, that referenced post does not explain what results you want to obtain, but rather a preconceived notion of how to obtain those results.

SQL is a declarative language and as such is predicated on what you want, not how to obtain it.

(37) By Keith Medcalf (kmedcalf) on 2021-04-25 20:22:43 in reply to 35 [link] [source]

Also note that you need to push the final accumulation into the CTE if you want a partial date range in order to prevent the date constraint being pushed into the cumulative tally calculations.

with reg1(id, date, status, pstatus)
  as (
          select id,
                 date_status_added,
                 reg_status,
                 lag(reg_status, 1) over (partition by id order by date_status_added, reg_status desc)
            from reg_history_all
           where reg_status in ('Registered', 'Left', 'Deceased')
        order by id, date_status_added, reg_status desc
     ),
     reg2(id, date, cnt)
  as (
        select id,
               date,
               case when status == 'Registered' then 1
                    else case when pstatus == 'Registered' then -1
                              else 0
                          end
                end
          from reg1
         where status IS NOT pstatus
     ),
     reg3(date, cnt)
  as (
          select date,
                 sum(cnt)
            from reg2
        group by date
        order by date
     ),
     reg4(date, cnt)
  as (
        select date,
               sum(cnt) over (rows between unbounded preceding and current row)
          from reg3
      order by date
     )
select date,
       cnt
  from reg4
 where date between 4500 and 4600
;

(20.1) By Keith Medcalf (kmedcalf) on 2021-04-23 23:52:47 edited from 20.0 in reply to 18.1 [link] [source]

In the first query, you essentially ask in the inner query to first give you a list of all the records, except the records where the status is "Deceased". Once you have that list, Group by id (which does nothing other than allow a HAVING clause) and then using the having clause only give the records where the status is "Registered".

This is incorrect.

The inner mostest query returns a list of all id's which are 'Deceased' ever, for exclusion from any further processing. Only records for id's where 'Deceased' is NEVER recorded throughout all eternity are candidates for further processing.

This set of candidate rows is further culled by the requirement that the date_status_added is less than the given constraint.

The remaining candidate rows are "grouped" by id (which is not unique and cannot be unique) and the row containing the max(date_status_added) is found -- this does not consider the reg_status (which could be 'Unregistered', 'Pregnant', or 'SARS-CoV-2 Infected', but will not be 'Deceased' at any time in its history).

Amongst the candidates (peer group) having that id (group by) and the found max value, one of those rows must have a reg_status of 'Registered'. Groups which do not meet this condition are skipped (they are removed as candidates).

Then, the number of id so found are counted. What exactly this is supposed to represent I have no clue. It most certainly is not the number of 'Registered' patients (but not yet dead) patients on the specified day.

The second query returns the count of the number of records where the date is less than a specified date and the reg_status is 'Registered'.

There is no "sameness" between the two queries except by happenstance.

Since we have no description of what it is that the OP is trying to obtain it will be impossible to provide any "how" to get it.

(8) By anonymous on 2021-04-22 21:23:08 in reply to 4.0 [link] [source]

It indeed needs to be all the dates in the range. Your query does give all dates but again gives the same wrong counts. The basic query needs to be like mine, except with all dates in the range.

RBS

(10) By Keith Medcalf (kmedcalf) on 2021-04-22 21:34:40 in reply to 8 [link] [source]

Then your definition of what to count is incorrect.

(11) By anonymous on 2021-04-22 21:40:38 in reply to 10 [link] [source]

Maybe I explained wrongly, but I get the right values and I think you misunderstand the data, which I posted, but I think you haven't seen yet.

RBS

(12) By Keith Medcalf (kmedcalf) on 2021-04-22 21:50:00 in reply to 10 [link] [source]

Perhaps you need to state what you want in English first.

For example, your original query translates into the following:

Consider a table reg_history_all which contains the columns id, date_status_added, and reg_status. For each id in the table find the max date_status_added excluding all id that ever have reg_status 'Deceased' but only consider records where the reg_status_added is less than 43000. Include only results where at least one record in the group has reg_status 'Registered'. Then count the number of resulting rows.

(13.2) By Keith Medcalf (kmedcalf) on 2021-04-22 23:13:21 edited from 13.1 in reply to 10 [link] [source]

For example. If the data that you want is defined as:

For the table reg_history_all with the fields id, date_status_added, and reg_status then for each date in the range of min(date_status_added) to max(date_status_added) calculate the number of unique id that have reg_status = 'Registered' on or before that date less the number of unique id that have reg_status = 'Deceased' on or before that date.

That is, calculate the number of "active id" for each date where "active" is defined as "Registered" on or before the date and not "Deceased" on or before that date, where each "id" is always "Registered" before it becomes "Deceased".

You can translate that into SQL directly.

with date_range(date) -- the dates we are interested in
  as (
         select min(date_status_added)
           from reg_history_all
      union all
         select date + 1
           from date_range
          where date < (
                        select max(date_status_added)
                          from reg_history_all
                       )
     )
select date,
       (
        select count(distinct id)
          from reg_history_all
         where date_status_added <= date
           and reg_status == 'Registered'
       )
       -
       (
        select count(distinct id)
          from reg_history_all
         where date_status_all <= date
           and reg_status == 'Deceased'
       )
  from date_range
;

There may be other (more efficient) ways to compute this depending on the quality of the source data (additional constraints).

(14) By anonymous on 2021-04-23 03:00:16 in reply to 13.2 [link] [source]

Thanks for that. The description of what the result should be sounds ok but I get wrong results, much too high numbers. The other problem is that it is very slow. The table reg_history_all has some 145000 rows. Would it be possible to run the sql I posted as a CTE with increasing dates in the date range?

RBS

(15) By Keith Medcalf (kmedcalf) on 2021-04-23 05:11:18 in reply to 14 [link] [source]

Did you fix the typo?

Of course it is slow. You probably do not have useful indexes defined.

If you need a different set of answers then describe whit it is that you want. In English. With no assumptions. And no pre-assumed answers.

(16) By anonymous on 2021-04-23 06:28:38 in reply to 15 [link] [source]

Yes, typo was fixed. Will see if I can explain in English the logic of the posted query that works. Will also see if I can add indexes to speed things up.

RBS

(23) By Keith Medcalf (kmedcalf) on 2021-04-24 06:41:30 in reply to 14 [link] [source]

No.

Referents cannot be passed between source tables but only to correlates.