2

Suppose I have patient admission/claim wise data like the sample below. Data type of patient_id and hosp_id columns is VARCHAR

Table name claims

rec_no patient_id hosp_id admn_date discharge_date
1 1 1 01-01-2020 10-01-2020
2 2 1 31-12-2019 11-01-2020
3 1 1 11-01-2020 15-01-2020
4 3 1 04-01-2020 10-01-2020
5 1 2 16-01-2020 17-01-2020
6 4 2 01-01-2020 10-01-2020
7 5 2 02-01-2020 11-01-2020
8 6 2 03-01-2020 12-01-2020
9 7 2 04-01-2020 13-01-2020
10 2 1 31-12-2019 10-01-2020

I have another table wherein bed strength/max occupancy strength of hospitals are stored.

table name beds

hosp_id bed_strength
1 3
2 4

Expected Results I want to find out hospital-wise dates where its declared bed-strength has exceeded on any day.

Code I have tried Nothing as I am new to SQL. However, I can solve this in R with the following strategy

  • pivot_longer the dates
  • tidyr::complete() missing dates in between
  • summarise or aggregate results for each date.

Simultaneously, I also want to know that whether it can be done without pivoting (if any) in sql because in the claims table there are 15 million + rows and pivoting really really slows down the process. Please help.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • 1
    You need to count overlapping intervals. Various solutions exist. – Salman A Jul 30 '21 at 10:07
  • can you provide me link with any one of them? – AnilGoyal Jul 30 '21 at 10:10
  • 1
    Try the approach in [this answer](https://stackoverflow.com/q/53495546). Basically you use union all to create a set of entry and exit dates, then perform a running sum... increase it on entry and decrease it on exit. The dates with sum greater than x are your target. – Salman A Jul 30 '21 at 10:17
  • 1
    @AnilGoyal can you also add the expected results? – Jim Jones Jul 30 '21 at 10:25

1 Answers1

3

You can use generate_series() to do something very similar in Postgres. For the occupancy by date:

select c.hosp_id, gs.date, count(*) as occupanyc
from claims c cross join lateral
     generate_series(admn_date, discharge_date, interval '1 day') gs(date)
group by c.hosp_id, gs.date;

Then use this as a subquery to get the dates that exceed the threshold:

select hd.*, b.strength
from (select c.hosp_id, gs.date, count(*) as occupancy
      from claims c cross join lateral
           generate_series(c.admn_date, c.discharge_date, interval '1 day') gs(date)
      group by c.hosp_id, gs.date
     ) hd join
     beds b
     using (hosp_id)
where h.occupancy > b.strength
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks Gordon for the help. It works. I wasn't aware of this `generate series` yet. I think it works something like `tidyr::complete()` in `r` – AnilGoyal Jul 30 '21 at 11:11
  • Can you please explain what is `lateral` doing here? I am trying to understand the syntax – AnilGoyal Jul 30 '21 at 13:53
  • 1
    @AnilGoyal . . . In this case, nothing actually because Postgres allows `generate_series()` to refer to columns in previously defined tables. However, this is really an example of a lateral join, so I like to be explicit. – Gordon Linoff Jul 30 '21 at 20:50