8

I would like to calculate the rainfall that has fallen over the last three days for each grid square, and add this as a new column in my data.table. To be clear, I want to sum up the current and PREVIOUS two (2) days of rainfall, for each meterological grid square

library ( zoo )
library (data.table)


# making the data.table
rain           <- c(NA, NA, NA, 0, 0, 5, 1, 0, 3, 10)  # rainfall values to work with
square         <- c(1,1,1,1,1,1,1,1,1,2)               # the geographic grid square for the rainfall measurement
desired_result <- c(NA, NA, NA, NA, NA, 5, 6, 6, 4, NA )  # this is the result I'm looking for (the last NA as we are now on to the first day of the second grid square)
weather <- data.table(rain, square, desired_result)  # making the data.table

My attempt to answer: this line used to work, but no longer does

weather[, rain_3 := filter(rain, rep(1, 2), sides = 1), by = list(square)]  

So here I am trying another method:

# this next line gets the numbers right, but sums the following values, not the preceeding ones. 
weather$rain_3 <- rollapply(zoo(weather$rain), list(seq(-2,0)), sum)

# here I add in the by weather$ square, but still no success
weather$rain_3 <- rollapply(zoo(weather$rain), list(seq(-2,0)), sum, by= list(weather$square))

I would greatly appreciate any insights or suggestions you may have.

Many thanks!

threeisles
  • 301
  • 2
  • 8

6 Answers6

24

Here's a quick and efficient solution using the latest data.table version (v 1.9.6+)

weather[, rain_3 := Reduce(`+`, shift(rain, 0:2)), by = square]
weather
#     rain square desired_result rain_3
#  1:   NA      1             NA     NA
#  2:   NA      1             NA     NA
#  3:   NA      1             NA     NA
#  4:    0      1             NA     NA
#  5:    0      1             NA     NA
#  6:    5      1              5      5
#  7:    1      1              6      6
#  8:    0      1              6      6
#  9:    3      1              4      4
# 10:   10      2             NA     NA

The basic idea here is to shift the rain column twice and then sum up the rows.

talat
  • 68,970
  • 21
  • 126
  • 157
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
3

The rollapply solution would be done like this:

weather[, rain_3 := rollapplyr(rain, 3, sum, fill = NA_real_), by = square]

giving:

    rain square desired_result rain_3
 1:   NA      1             NA     NA
 2:   NA      1             NA     NA
 3:   NA      1             NA     NA
 4:    0      1             NA     NA
 5:    0      1             NA     NA
 6:    5      1              5      5
 7:    1      1              6      6
 8:    0      1              6      6
 9:    3      1              4      4
10:   10      2             NA     NA

Update

Have simplified based on version of zoo that came out since this question was originally asked.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2
weather[, rain_3 := filter(rain, rep(1, 3), sides = 1), by = list(square)]  
#Error in filter(rain, rep(1, 3), sides = 1) : 
#  'filter' is longer than time series
weather[, rain_3 := if(.N > 2) filter(rain, rep(1, 3), sides = 1) else NA_real_, 
        by = square] 
#    rain square desired_result rain_3
# 1:   NA      1             NA     NA
# 2:   NA      1             NA     NA
# 3:   NA      1             NA     NA
# 4:    0      1             NA     NA
# 5:    0      1             NA     NA
# 6:    5      1              5      5
# 7:    1      1              6      6
# 8:    0      1              6      6
# 9:    3      1              4      4
#10:   10      2             NA     NA

Take care that dplyr is not loaded because it masks filter. If you need dplyr, you can call stats::filter explicitly.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • 1
    You were spot on that it was dplyr causing that issue. by using 'detach("package:dplyr", unload=TRUE)' my original code began to work once more. I really appreciate your insight. – threeisles Sep 18 '15 at 12:52
2

You have almost got the answer yourself. rollsum (or rollapply in your case) gives you the vector of length N-2, so you just have to fill the desired cells with NAs. It can be simply done like this: roll<-c(NA,NA,rollsum(yourvector,k=3))

Here is how I do it. I am using roll_sum from {RcppRoll} package, because it is much faster and deals with NAs easier. Simple by argument from data.table lets you group result by square.

library(RcppRoll)
weather[,rain_3:=if(.N>2){c(NA,NA,roll_sum(rain,n=3))}else{NA},by=square]
weather

    rain square desired_result rain_3
 1:   NA      1             NA     NA
 2:   NA      1             NA     NA
 3:   NA      1             NA     NA
 4:    0      1             NA     NA
 5:    0      1             NA     NA
 6:    5      1              5      5
 7:    1      1              6      6
 8:    0      1              6      6
 9:    3      1              4      4
10:   10      2             NA     NA
Maksim Gayduk
  • 1,051
  • 6
  • 13
2

Late to the party, but a more recent version of data.table package (1.12.8 for me) has frollsum function that will accomplish this a bit more cleanly than earlier (but very much valid) answers:

library (data.table)

# making the data.table
rain           <- c(NA, NA, NA, 0, 0, 5, 1, 0, 3, 10)  # rainfall values to work with
square         <- c(1,1,1,1,1,1,1,1,1,2)               # the geographic grid square for the rainfall measurement
desired_result <- c(NA, NA, NA, NA, NA, 5, 6, 6, 4, NA )  # this is the result I'm looking for (the last NA as we are now on to the first day of the second grid square)
weather <- data.table(rain, square, desired_result)  # making the data.table

# using `frollsum`
weather[, rain3 := frollsum(rain, n = 3), by = square][]
#>     rain square desired_result rain3
#>  1:   NA      1             NA    NA
#>  2:   NA      1             NA    NA
#>  3:   NA      1             NA    NA
#>  4:    0      1             NA    NA
#>  5:    0      1             NA    NA
#>  6:    5      1              5     5
#>  7:    1      1              6     6
#>  8:    0      1              6     6
#>  9:    3      1              4     4
#> 10:   10      2             NA    NA

Created on 2020-07-09 by the reprex package (v0.3.0)

zack
  • 5,205
  • 1
  • 19
  • 25
1

A dplyr solution:

library(dplyr)
weather %>% 
  group_by(square) %>% 
  mutate(rain_3 = rain + lag(rain) + lag(rain, n = 2L))

Result:

Source: local data table [10 x 4]

    rain square desired_result rain_3
   (dbl)  (dbl)          (dbl) (dbl)
1     NA      1             NA    NA
2     NA      1             NA    NA
3     NA      1             NA    NA
4      0      1             NA    NA
5      0      1             NA    NA
6      5      1              5     5
7      1      1              6     6
8      0      1              6     6
9      3      1              4     4
10    10      2             NA    NA

If you want to assign rain3 to your dataset, you can use the %<>% symbol from maggritr in your pipe:

library(magrittr)
weather %<>%
  group_by......
donlelek
  • 783
  • 4
  • 13