I have a data frame dat
holding hourly traffic count data dat$x1
and a column in the dataframe dat$x48
where I want the sum of dat$x1
for the next 48 rows. This is a 48 hour volume for the period starting in the hour represented by the row.
I tried a for loop which was very slow. My research suggests the for loop was a bad idea, and that I should be looking at one of the apply
functions instead. However, I could not figure out how to use the apply function for this purpose after checking introductions to that function.
Here is the for
loop that I tried but was too slow:
for(i in 1:nrow(dat)){
dat[i,15] <- sum(dat[c(i:i+47),8]) #x1 is in column 8 and x48 is in column 15
}
For a simplified example, where I wanted only 4-hour sums, it would start like this:
x1 x4
1 NA
5 NA
3 NA
8 NA
6 NA
2 NA
1 NA
1 NA
...
I want the dataframe to end up like this where x4 is the sum of the corresponding x1 value and the next 3 x1 values.
x1 x4
1 17
5 22
3 19
8 17
6 10
...
The dataframe has 2 million rows