-3

I have monthly rainfall data for a location for a period of 150 yrs predicted by 22 different models. The columns look like: Year Month Model1 Model2 Model3 .....Model 22

The data look like:

1950 1 75 90 120...67
1950 2 .. .. .. .. ..
1950 (repeat 12 times for 12 months of the year)
1951
...
2099

I am interested in extracting only the June through August data by model and then finding the sum for these 3 months. I've used the subset feature before and the melt feature (reshape package) but I'm not sure how to combine the two. Any help would be appreciated. Thanks, V

David Arenburg
  • 91,361
  • 17
  • 137
  • 196

2 Answers2

2

This should get you there: Using dplyr. Using akrun's data

df %>%
  filter(Month %in% 6:8) %>%
  group_by(Year) %>%
  summarise_each(funs(sum),-Month)


  Year     Model1     Model2    Model3
1 1950  1.3107384 -0.3295989 -1.521790
2 1951 -3.7768090  1.0092107  1.737716
3 1952  0.5202926  2.6768969  1.890024
Andrew Taylor
  • 3,438
  • 1
  • 26
  • 47
2

Here's one way using data.table:

library(data.table)
setDT(df)[Month %in% 6:8, lapply(.SD, sum), by=Year, .SDcols= -"Month"]
#   Year     Model1     Model2    Model3
#1: 1950  1.3107384 -0.3295989 -1.521790
#2: 1951 -3.7768090  1.0092107  1.737716
#3: 1952  0.5202926  2.6768969  1.890024

We obtain those row indices where Month is 6, 7 or 8, and on that subset, we compute mean of all columns except Month (specified in .SDcols) while grouping by Year.

Additionally, we can also setkey on the data.table, that speeds up the subset operation tremendously with the help of binary search - see vignettes.

setkey(setDT(df), Month) ## reorders `df` by `Month`
df[J(6:8), lapply(.SD, sum), by=Year, .SDcols= -"Month"]

Note that just the first part changed from Month %in% 6:8 to J(6:8). The latter performs a binary search based subset on the key column Month. Everything else is intact.

If you do repeated subsetting or work with very large data sets, keying data.tables and subsetting using binary search would improve performance tremendously. Refer to the vignettes for more info.

data

set.seed(42)
df <- data.frame(Year= rep(1950:1952,each=12), Month=rep(1:12,3), 
            Model1=rnorm(36), Model2=rnorm(36), Model3=rnorm(36))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662