1

Wondering if anyone can suggest a more succinct alternative to achieve the same outcome as below?

I have a dataframe with three columns date, time, and rain. It is a dataset of hourly rainfall records. I want to create 96 new columns. Each new column should return values that are the result of a rolling sum calculation performed on col rain.

The new columns and the desired rolling sums follow this pattern:

  • rain2 = sum the current value in col rain with the value that precedes it
  • rain3 = sum the current value in col rain with the two values that precede it in col rain
  • .... and so on...
  • rain96 = sum the current value in col rain with the 95 values that precede it

I have achieved the desired result using mutate and RcppRoll::roll_sum. But this approach requires copy and pasting 96 lines of code.

Here's the code to create a sample dataset:

df <- tibble(date = as_date(10), time = 9:18, rain = c(2,0,0,3,4,1,15,4,0,0.5))

Here's the sample dataset:

      date     time rain 
 1 1970-01-11     9   2  
 2 1970-01-11    10   0  
 3 1970-01-11    11   0  
 4 1970-01-11    12   3  
 5 1970-01-11    13   4  
 6 1970-01-11    14   1  
 7 1970-01-11    15  15  
 8 1970-01-11    16   4  
 9 1970-01-11    17   0  
10 1970-01-11    18   0.5

Here is the desired result (noting for simplicity I removed cols rain4-rain9 from output below):

      date     time rain rain2 rain3...  rain10
 1 1970-01-11     9   2   na   na          na
 2 1970-01-11    10   0   2    na          na
 3 1970-01-11    11   0   0    2           na
 4 1970-01-11    12   3   3    3           na
 5 1970-01-11    13   4   7    7           na
 6 1970-01-11    14   1   5    8           na
 7 1970-01-11    15  15   16   20          na
 8 1970-01-11    16   4   19   20          na
 9 1970-01-11    17   0   4    19          29
10 1970-01-11    18   0.5 0.5  4.5         29.5

Here is my current solution which works but requires copy and pasting 96 lines of code when applying to my full dataset:

df_new <- df %>% mutate(rain2 = roll_sum(rain,2, fill=NA, align="right"),
                        rain3 = roll_sum(rain,3, fill=NA, align="right"),
                        rain4 = roll_sum(rain,4, fill=NA, align="right"),
                        rain5 = roll_sum(rain,5, fill=NA, align="right"),
                        rain6 = roll_sum(rain,6, fill=NA, align="right"),
                        rain7 = roll_sum(rain,7, fill=NA, align="right"),
                        rain8 = roll_sum(rain,8, fill=NA, align="right"),
                        rain9 = roll_sum(rain,9, fill=NA, align="right"),
                        rain10 = roll_sum(rain,10, fill=NA, align="right"))

Thank you!

nomis
  • 11
  • 2
  • Related [Compute all fixed window averages with dplyr and RcppRoll](https://stackoverflow.com/questions/49909528/compute-all-fixed-window-averages-with-dplyr-and-rcpproll). You can also use `data.table::frollsum` which allows multiple window sizes: `library(data.table)`; `setDT(df)`; `n = seq(3) + 1`; `df[ , paste0("rain", n) := frollsum(rain, n)]`. – Henrik Dec 04 '21 at 10:21
  • `for(i in 2:5){ df[[paste0("rain", i)]] <- roll_sum(df$rain, n = i, fill=NA, align="right") }` like in the accepted answer from the link above – phiver Dec 04 '21 at 10:30

0 Answers0