0

I'm trying to use R to measure how many days supply of a prescription an individual already has on-hand when they make a refill, taking into account all previous prescriptions. For example, if I had this table...

  member rx_id  fill_date    to_date days_supply
1      A     1 2018-10-01 2018-10-02           2
2      B     1 2016-11-07 2016-11-10           4
3      B     2 2016-11-07 2016-12-04          28
4      B     3 2016-11-08 2016-11-09           2
5      B     4 2016-11-10 2016-12-03          24

I'd expect the following output

  member rx_id  fill_date    to_date days_supply_on_hand
1      A     1 2018-10-01 2018-10-02                   0
2      B     1 2016-11-07 2016-11-10                   0
3      B     2 2016-11-07 2016-12-04                   4
4      B     3 2016-11-08 2016-11-09                  30
5      B     4 2016-11-10 2016-12-03                  26

For member B, when the second script is filled on the same day as the first script, the individual already has 4 days worth of RX on hand. When the third script is filled, the individual has 3 days left from the first script and 27 left from the second (30 total). When the fourth script is filled, the third script is depleted, but there is 1 day left from the first script and 25 from the third script (26 total).

I know how to do rolling totals in both dplyr and data.table, but I can't figure out how to take into account variable levels of depletion based on previous records on an individual by individual basis. Below is code to remake the original table, thanks in advance for any suggestions!

structure(list(member = structure(c(1L, 2L, 2L, 2L, 2L), .Label = 
c("A", 
"B"), class = "factor"), rx_id = c(1, 1, 2, 3, 4), fill_date = 
structure(c(17805, 
17112, 17112, 17113, 17115), class = "Date"), to_date = 
structure(c(17806, 
17115, 17139, 17114, 17138), class = "Date"), days_supply = c(2, 
4, 28, 2, 24)), .Names = c("member", "rx_id", "fill_date", 
"to_date", 
"days_supply"), row.names = c(NA, -5L), class = "data.frame")
T. Shiftlet
  • 158
  • 13

2 Answers2

2
library(data.table)
dt = as.data.table(your_df) # or setDT to convert in place

# merge on relevant days, then compute sum of supply - days elapsed
dt[dt, on = .(member, fill_date <= fill_date, to_date >= fill_date, rx_id < rx_id), by = .EACHI,
   sum(days_supply, na.rm = T) - sum(i.fill_date - x.fill_date, na.rm = T)]
#   member  fill_date    to_date rx_id      V1
#1:      A 2018-10-01 2018-10-01     1  0 days
#2:      B 2016-11-07 2016-11-07     1  0 days
#3:      B 2016-11-07 2016-11-07     2  4 days
#4:      B 2016-11-08 2016-11-08     3 30 days
#5:      B 2016-11-10 2016-11-10     4 26 days
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Upon a closer look at the result, it looks like the to_date column is always producing the same result as the fill_date column. – T. Shiftlet Dec 27 '18 at 21:13
  • 2
    That's an unfortunate issue with current column naming in `data.table` (columns are named based on lhs of the `on`, instead of the `rhs`, which should make it clear why those columns are the same - they're both `fill_date` from rhs). Add `i.to_date` to the output. – eddi Dec 27 '18 at 21:46
  • best way to deal with column naming in such cases is to use `x.` and `i.` prefixes – jangorecki Dec 31 '18 at 06:51
  • As a follow-up, any chance you have an idea of how to replicate this in a SQL dialect like postgres...? – T. Shiftlet Feb 21 '19 at 15:19
1

Using a simple loop

dt$days_supply_on_hand <- 0
for (a in unique(dt$member)) {
  I <- which(.subset2(dt,1) == a)
  flDate <- as.integer(.subset2(dt,3)[I])
  toDate <- as.integer(.subset2(dt,4)[I])
  V <- vapply(seq_along(I), function (k) sum(toDate[1:(k-1)] - flDate[k] + 1), numeric(1))
  dt$days_supply_on_hand[I] <- c(0,V[-1])
}
dt
  member rx_id  fill_date    to_date days_supply days_supply_on_hand
1      A     1 2018-10-01 2018-10-02           2                   0
2      B     1 2016-11-07 2016-11-10           4                   0
3      B     2 2016-11-07 2016-12-04          28                   4
4      B     3 2016-11-08 2016-11-09           2                  30
5      B     4 2016-11-10 2016-12-03          24                  26

where dt is data frame provided above. (Note that the use of .subset2 or as.integer is for efficiency purposes - they can be changed for more readability).

niko
  • 5,253
  • 1
  • 12
  • 32