I have a coding question on a large dataset. I'm trying to use the 'runner' function because I think this solution required a rolling window function and also it's a large dataset so I'm doubtful that a for loop would work.
Here is a recreation of the relevant parts of that dataset.
id <- c("1111", "1111", "1111", "1111", "2222", "2222", "2222", "2222", "3333", "3333")
date <- c("2022-01-01", "2022-02-01", "2022-03-01", "2022-04-01", "2022-01-01", "2022-02-01", "2022-03-01", "2022-04-01", "2022-01-01", "2022-02-01")
drug <- c("Drug A", "Drug B", "Drug A", "Drug A", "Drug B", "Drug A", "Drug A", "Drug B", "Drug A", "Drug B")
day_supply <- c(1, 2, 3, 5, 8, 13, 21, 34, 21, 13)
drug_table <- tibble(id, date, drug, day_supply)
drug_table$date <- as.Date(drug_table$date)
For the research question, continuous treatment in a drug is defined as having a 180-day window where the prescription refill gap is <= 7 in any 180-day stretch of time.
I'm working out some of the logic here in this code:
drug_table %>%
mutate(drug_will_be_used_until = date + day_supply) %>%
mutate(refill_gap_at_time_of_new_prescription_since_last_prescription = date - lag(drug_will_be_used_until))
As a first step, I'm calculating the number of days that have passed without a prescription, at the time a new prescription is generated. But what about the rolling window function part? I found this library called 'runner' that I think is probably useful for this problem. Here is what my code looks like when trying to use runner.
library(runner)
drug_table %>%
group_by(id) %>%
mutate(drug_will_be_used_until = date + day_supply) %>%
mutate(refill_gap_at_time_of_new_prescription_since_last_prescription = date - lag(drug_will_be_used_until)) %>%
arrange(id, date) %>%
filter(refill_gap_at_time_of_new_prescription_since_last_prescription != "NA days") %>%
group_by(id) %>%
mutate(max_refill_gap_in_best_180_day_stretch = runner::runner(refill_gap_at_time_of_new_prescription_since_last_prescription, min, k = 180, idx = date))
I think this answer is kind of close, but I don't think it quite answers my question. With this code, I'm finding the maximum refill gap in any 180-day stretch... but what I really want to figure out is what is the maximum refill gap in any 180-day stretch, where that maximum refill gap is the smallest maximum of any 180-day stretch... The exact research question, that I think that logic would answer, is: Is there any 180-day period where the refill gap is not more than 7-days?
I'm trying to create a new reference table that looks like this:
id max_refill_gap_in_best_180_day_stretch 180_day_continuous_use_indicator
1111 6 1
2222 7 1
3333 8 0
(This table doesn't correspond to the dataframe I created, above.)
The binary indicator should say 0 if patient id did not have 180 days of continuous treatment in a drug, and the binary indicator should say 1 if they did have 180 days of continuous treatment in a drug.
Updating here the question based on feedback from the comments:
I created a new dataframe. (The method here is a little bit inelegant.)
id <- rep(c("1111", "1111", "1111", "1111", "1111"), times = 73)
date <- seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by="days")
# drug <- rep(c("Drug A", "Drug B", "Drug A", "Drug A", "Drug A"), times = 73)
day_supply <- rep(c(1, 2, 3, 4, 5), times = 73)
drug_table <- tibble(id, date, day_supply)
drug_table$date <- as.Date(drug_table$date)
drug_table <- drug_table %>%
slice(which(row_number() %% 6 == 1))
drug_table <- drug_table %>%
filter(date != "2018-01-13") %>%
filter(date != "2018-01-19")
drug_table %>%
mutate(drug_will_be_used_until = date + day_supply) %>%
mutate(refill_gap_at_time_of_new_prescription_since_last_prescription = date - lag(drug_will_be_used_until)) -> drug_table
######
id <- rep(c("2222", "2222", "2222", "2222", "2222"), times = 73)
date <- seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by="days")
# drug <- rep(c("Drug A", "Drug B", "Drug A", "Drug A", "Drug A"), times = 73)
day_supply <- rep(c(1, 2, 3, 4, 5), times = 73)
drug_table2 <- tibble(id, date, day_supply)
drug_table2$date <- as.Date(drug_table2$date)
drug_table2 <- drug_table2 %>%
slice(which(row_number() %% 6 == 1))
drug_table2 <- drug_table2 %>%
filter(date != "2018-01-13") %>%
filter(date != "2018-01-19") %>%
filter(date != "2018-02-12") %>%
filter(date != "2018-02-18") %>%
filter(date != "2018-03-14") %>%
filter(date != "2018-03-20") %>%
filter(date != "2018-04-13") %>%
filter(date != "2018-04-19") %>%
filter(date != "2018-05-13") %>%
filter(date != "2018-05-19") %>%
filter(date != "2018-06-12") %>%
filter(date != "2018-06-18") %>%
filter(date != "2018-06-12") %>%
filter(date != "2018-06-18") %>%
filter(date != "2018-07-12") %>%
filter(date != "2018-07-18") %>%
filter(date != "2018-07-12") %>%
filter(date != "2018-07-18") %>%
filter(date != "2018-08-11") %>%
filter(date != "2018-08-17") %>%
filter(date != "2018-09-10") %>%
filter(date != "2018-09-16") %>%
filter(date != "2018-10-10") %>%
filter(date != "2018-10-16") %>%
filter(date != "2018-11-09") %>%
filter(date != "2018-11-15")
drug_table2 %>%
mutate(drug_will_be_used_until = date + day_supply) %>%
mutate(refill_gap_at_time_of_new_prescription_since_last_prescription = date - lag(drug_will_be_used_until)) -> drug_table2
drug_table_df <- bind_rows(drug_table,
drug_table2)
#####
id2 <- c("1111", "2222")
max_refill_gap_in_best_180_day_stretch <- c(5, 16)
the_180_day_continuous_use_indicator <- c(1, 1)
drug_table_output <- tibble(id2, max_refill_gap_in_best_180_day_stretch, the_180_day_continuous_use_indicator)
In the first case, I have the id "1111" with a dataframe that has one refill gap of 16 days early on. But starting at the end of January, any stretch of 180 days is associated with a maximum refill gap of 5 days.
In the second case, I have the id "2222" but less frequent visits mean that there's a refill gap of 16 days that occurs pretty much every month. So in any 180 day stretch, there's going to be a refill gap of 16 days.
So we can say with patient 1111, there exists a 180 day stretch where the maximum refill gap is under 7 days. But with patient 2222, there doesn't exist any 180 day period where the maximum refill gap is under 7 days.