I have a dataset which looks like this:
date, brandname, status, case number
2017-01-01, x1, closed, 12345
2017-01-01, x2, closed, 12345
2017-01-01, x3, closed, 12345
2017-01-02, x4, open, 7864
2017-01-03, x5, open, 78642
...
There can be double dates and there can be gaps between dates (missing days)
Per day I want to count the number of rows (cases) that have occurred in the past 30 days.
I succeeded in doing this the following way:
dataframe <-
structure(list(date = structure(c(17167, 17167, 17167, 17168,
17169), class = "Date"), brandname = c("x1", "x2", "x3", "x4",
"x5"), status = c("closed", "closed", "closed", "open", "open"
), `case number` = c(12345L, 12345L, 12345L, 7864L, 78642L)), .Names = c("date",
"brandname", "status", "case number"), row.names = c(NA, -5L), class = "data.frame")
for(i in 1:nrow(dataframe)) {
frame <- subset(dataframe, date > dataframe$date[i] - 30L &
date < dataframe$date[i])
dataframe$count[i] <- nrow(frame)
}
This gives me the exact result I need:
dataframe
date brandname status case number count
1 2017-01-01 x1 closed 12345 0
2 2017-01-01 x2 closed 12345 0
3 2017-01-01 x3 closed 12345 0
4 2017-01-02 x4 open 7864 3
5 2017-01-03 x5 open 78642 4
But I am sure there is a more efficient way of doing this. Can anyone help?