I'm trying to build a retained cohort table, but I'm having trouble to set it up to return distinct count ids for each period independent of first time seen (as in not related to the first period they are seen in the data).
I've been trying a solution, but mostly I get the new users each period instead of if they appeared in the period regardless of having appeared in the previous period.
Ex: For this set of values:
id quarter
7 Q1
7 Q1
5 Q1
8 Q1
3 Q1
6 Q1
10 Q1
3 Q2
10 Q2
8 Q2
2 Q2
7 Q2
6 Q2
6 Q3
9 Q3
6 Q3
4 Q3
9 Q3
2 Q3
5 Q4
8 Q4
10 Q4
7 Q4
1 Q4
8 Q4
instead of this: (the user is only counted in the cohort if its the first time he is seen in the data)
# [,1] [,2] [,3] [,4]
#Q1 6 5 1 5
#Q2 1 1 0 0
#Q3 2 0 0 0
#Q4 1 0 0 0
I want this: (the user is counted in the cohort if he appeared in the period regardless of being the first time)
# [,1] [,2] [,3] [,4]
#Q1 6 5 1 4
#Q2 6 2 3 0
#Q3 4 0 0 0
#Q4 5 0 0 0
What I've tried:
test <- list(id = c(7, 7, 5, 8, 3, 6, 10, 3, 10, 8, 2, 7, 6,
6, 9, 6, 4, 9, 2, 5, 8, 10, 7, 1, 8), quarter = c("Q1", "Q1",
"Q1", "Q1", "Q1", "Q1", "Q1", "Q2", "Q2", "Q2", "Q2", "Q2", "Q2",
"Q3", "Q3", "Q3", "Q3", "Q3", "Q3", "Q4", "Q4", "Q4", "Q4", "Q4",
"Q4"))
test <- as.data.table(test)
quarts <- sort(unique(test$quarter))
test$occur <- 1
mat <- dcast.data.table(test, id ~ quarter, value.var = "occur", fun.aggregate = sum)
mat[mat > 1] <- 1
mat<-as.data.frame(mat)
res2<-matrix(0, nrow = ncol(mat)-1, ncol = ncol(mat)-1)
res2<-as.data.frame(res2)
i<-2
for (i in 2:ncol(mat)){
res2[i-1,1]<-sum(mat[,i])
}
for (i in 2:ncol(mat)){
for (t in 1:nrow(mat)){
if (mat[t,i] > 0) {
res2[i-1,i]<-res2[i-1,i]+mat[i,i+1]
}
}
}
But it gives me an error. Would appreciate any suggestion. Thank you!