I have data which looks like this
df <- data.frame(
ID = c(rep("A12345",5), rep("A23456",10), rep("A34567",5), "A45678", "A67891", rep("A78910",8), "A91011",
rep("A10111",4), rep("A11121",3), "A12131", "A16731"),
medication = c(rep("colchicine",5), rep("febuxosat",9), "hosps", rep("colchicine",5), "hosps", "colchicine",
rep("allopurinol",8), "allopurinol",
rep("colchicine",3), "hosps", rep("colchicine",3), "colchicine", "allopurinol"),
Date = c("2004-12-08", "2005-01-28", "2005-07-15", "2005-08-23", "2005-11-30", "2007-02-01", "2007-07-20", "2014-06-03",
"2008-04-17",
"2008-12-19", "2009-09-09", "2010-02-24", "2010-11-01", "2010-12-03", "2011-08-10", "2012-11-05", "2012-12-17",
"2012-12-19", "2013-10-03", "2013-12-11", "2014-03-26", "2015-11-12", "2014-08-07", "2008-01-31", "2008-02-21",
"2008-09-19", "2008-11-06", "2009-01-06", "2009-01-14", "2009-03-25", "2009-03-27", "2009-06-18", "2009-08-18",
"2009-09-08", "2009-11-13", "2010-01-21", "2010-04-19", "2010-07-07", "2010-08-06", "2010-08-19")
)
I then want to create a new year variable, based on the date; group everyone together based on year and their unique ID, and compute a variable which measures how many times they received medications in that year for that unique ID.
df <- df %>%
mutate(year = as.numeric(substr(Date, 1,4))) %>%
group_by(ID) %>%
mutate(meds_count = ifelse(medication %in% c("colchicine", "allopurinol", "febuxosat"), 1, 0)) %>%
unite(ID_year, ID, year, sep = "_", remove = FALSE) %>%
group_by(ID_year) %>%
mutate(meds_sum = sum(meds_count)) %>%
distinct(ID_year, .keep_all = TRUE)
Then I create a new variable 'gout', which is value one if the meds_sum variable is equal to or greater than 4; otherwise 0.
df <- df %>%
mutate(gout = ifelse(meds_sum >= 4, 1, 0))
Then, I want to create a new variable, 'gout2', which is value one if the meds_sum variable is equal to or greater than four, and is one if the meds_sum is non-zero in the year before or after. This is what I try to do for this last step, but lead() and lag() are creating NA values in this code.
df <- df %>%
mutate(gout2 = ifelse((meds_sum >= 4 & ((lead(meds_sum) >= 1 | lag(meds_sum)) >= 1)), 1, 0))
Can anyone tell me what I'm doing wrong?
This is what I would like the output to look like:
df$gout2 <- c(0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0)