I have a dataset like this. The date_e was accurate for status= "1". I want to simulate date_e based on age. Therefore, new_date_e will be changed for status="0", will be same for status="1". Also, status=1 has higher risk, so df= date_e-age should be in average shorter for status="1"than "0".
age date_e status id
1 1950-10-21 2008-11-02 0 1
2 1941-02-11 2006-08-28 0 2
3 1940-01-20 2000-05-25 0 3
4 1957-11-05 2008-03-28 1 4
5 1946-09-15 2004-03-10 0 5
and the data is :
library(dplyr)
set.seed(1)
age <- sample(seq(as.Date('1930-01-01'), as.Date('1970-01-01'), by="day"), 1000)
date1 <- sample(seq(as.Date('2000-01-01'), as.Date('2010-01-01'), by="day"), 1000)
status <- sample(c(0, 1), size = 1000, replace = TRUE, prob = c(0.8, 0.2))
df <- data.frame(age, date1, status)
df <- df %>% mutate(id = row_number())