I need to calculate the number of occurrences of specific events over specified time periods. Assume I have the following data:
set.seed(1453)
id = c(1,1,1,1,1,2,2,2,2,2)
x_1 = sample(0:1, 10, TRUE)
x_5 = sample(0:1, 10, TRUE)
date = c('2016-01-01',
'2016-02-01',
'2016-02-23',
'2016-03-04',
'2016-04-01',
'2016-01-01',
'2016-02-01',
'2016-02-23',
'2016-03-04',
'2016-04-01'
)
df = data.frame(id,date=as.Date(date),snapshot_date = as.Date(date)+1,x_1,x_5)
TABLE 1. (INPUT)
id date snapshot_date x_1 x_5
1 2016-01-01 2016-01-02 1 0
1 2016-02-01 2016-02-02 0 1
1 2016-02-23 2016-02-24 1 1
1 2016-03-04 2016-03-05 0 0
1 2016-04-01 2016-04-02 0 1
2 2016-01-01 2016-01-02 1 1
2 2016-02-01 2016-02-02 1 0
2 2016-02-23 2016-02-24 0 0
2 2016-03-04 2016-03-05 0 0
2 2016-04-01 2016-04-02 1 1
I need to calculate how many times x_1=1 and x_5=1 occurred within the last 3 months (per each month). So I first create dummy variables: if x_1 =1, then x_1_n = TRUE. Otherwise, x_1_n = FALSE. Similarly, for x_5_n. I also create the dates for three months going backwards.
df$x_1_n <- ifelse((df$x_1 ==1), TRUE, FALSE)
df$x_5_n <- ifelse(df$x_5==1, TRUE, FALSE)
library(lubridate)
for (i in 1:3) {
DATE_MO <- as.Date(df$snapshot_date) %m-% months(i)
df[,paste0("DATE_MO", i)] <- DATE_MO
}
I have variables x_1, x_5. I need to write a loop that goes through all of those variables x_1,x_5 and calculate the number of occurrences between certain dates. The original code runs and is correct. But I want to see how to simplify it using the for loop so that I don't have to manually copy-paste the code for each x_1 and x_5 as the number of x_'s and dates are bigger in the original version.
library(data.table)
df <- data.table(df)
df[,c("x1_dminus_mo1",
"x1_dminus_mo2",
"x1_dminus_mo3"
) :=. (df[x_1_n][df[,.(id,DATE_MO1,snapshot_date)], on=.
(id, date >= DATE_MO1, date < snapshot_date), .N, by = .EACHI] $N
,
df[x_1_n][df[,.(id,DATE_MO2, DATE_MO1)], on=.
(id, date >= DATE_MO2, date < DATE_MO1), .N, by = .EACHI] $N
,
df[x_1_n][df[,.(id,DATE_MO3, DATE_MO2)], on=.
(id, date >= DATE_MO3, date < DATE_MO2), .N, by = .EACHI] $N
)]
df[,c("x5_dminus_mo1",
"x5_dminus_mo2",
"x5_dminus_mo3"
) :=. (df[x_5_n][df[,.(id,DATE_MO1,snapshot_date)], on=.
(id, date >= DATE_MO1, date < snapshot_date), .N, by = .EACHI] $N
,
df[x_5_n][df[,.(id,DATE_MO2, DATE_MO1)], on=.
(id, date >= DATE_MO2, date < DATE_MO1), .N, by = .EACHI] $N
,
df[x_5_n][df[,.(id,DATE_MO3, DATE_MO2)], on=.
(id, date >= DATE_MO3, date < DATE_MO2), .N, by = .EACHI] $N
)]
I want to get the following table but using the loop.
TABLE 2 (OUTPUT)
df[,c(1,2,4,11,12,13)]
id date x_1 x1_dminus_mo1 x1_dminus_mo2 x1_dminus_mo3
1 2016-01-01 1 1 0 0
1 2016-02-01 0 0 1 0
1 2016-02-23 1 1 1 0
1 2016-03-04 0 1 0 1
1 2016-04-01 0 0 1 0
2 2016-01-01 1 1 0 0
2 2016-02-01 1 1 1 0
2 2016-02-23 0 1 1 0
2 2016-03-04 0 0 1 1
2 2016-04-01 1 1 0 1