0

I wish to compute cumulative sums of time windows of 3 and 7 days with conditions on a very large data frame.

This is an example of my data frame:

Date     X   
1.1.18   0   
2.1.18   0   
3.1.18   0   
4.1.18   NA   
5.1.18   0.3   
6.1.18   NA 
7.1.18   0   
8.1.18   NA  
9.1.18   NA 
10.1.18  NA
11.1.18  0
12.1.18  0.9
13.1.18  0.2
14.1.18  0.2
15.1.18  NA
16.1.18  0.3 

The cumulative sums should be calculated based on the X column. The conditions are:

(1) If all X-values of the time window are NA -> the CumSum should be NA.

(2) If all X-values of the time window are 0 -> the CumSum should be 0.

(3) If the X-values of the time window are either 0 or NA -> the CumSum should be NA.

(4) If all X-values of the time window are >0 -> the values should be summarized.

(5) If all X-values of the time window are either >0 or NA -> the values should be summarized.

(6) If all X-values of the time window are either >0 or 0 -> the values should be summarized.

The result should look like this:

Date     X    3CumSumX   7CumSumX   
1.1.18   0       NA         NA
2.1.18   0       NA         NA 
3.1.18   0       NA         NA
4.1.18   NA      0          NA
5.1.18   0.3     NA         NA   
6.1.18   NA      0.3        NA
7.1.18   0       0.3        NA  
8.1.18   NA      0.3        0.3
9.1.18   NA      NA         0.3
10.1.18  NA      NA         0.3
11.1.18  0       NA         0.3
12.1.18  0.9     NA         0.3
13.1.18  0.2     0.9        0.9
14.1.18  0.2     1.1        1.1
15.1.18  NA      1.3        1.3
16.1.18  0.3     0.4        1.3

What I have so far is following code, but conditions (1) and (3) are not met here:


data$`3CumSumX` <- NA     # column for 3 days cumulative values
data$`7CumSumX` <- NA     # column for 7 days cumulative values

data[17,] <- NA         # additional row for cumulative values

data[4:17,3] <- rollapply(data[1:16,2], width=3, FUN=sum, na.rm=TRUE)
data[8:17,4] <- rollapply(data[1:16,2], width=7, FUN=sum, na.rm=TRUE)

Unfortunately I still have no clue how to include my conditions, so any help would be appreciated.

user0405
  • 15
  • 6
  • You can simplify your rules significantly: `X[X==0] <- NA`, then sum normally with `na.rm=TRUE`. (To preserve the zeroes, this assignment can be done in a temp variable.) – r2evans Oct 24 '18 at 17:19
  • 1
    But you still have an upstream problem with your data, since you do not have proper `Date`s in your data. When you include your previous efforts, I suggest you change your `Date` column to proper `POSIXct` or `Date` (as in `as.Date(...)`) objects. – r2evans Oct 24 '18 at 17:20

0 Answers0