Here's one way using data.table
:
library(data.table)
setDT(df)[Month %in% 6:8, lapply(.SD, sum), by=Year, .SDcols= -"Month"]
# Year Model1 Model2 Model3
#1: 1950 1.3107384 -0.3295989 -1.521790
#2: 1951 -3.7768090 1.0092107 1.737716
#3: 1952 0.5202926 2.6768969 1.890024
We obtain those row indices where Month
is 6, 7 or 8, and on that subset, we compute mean of all columns except Month
(specified in .SDcols
) while grouping by Year
.
Additionally, we can also setkey
on the data.table, that speeds up the subset operation tremendously with the help of binary search - see vignettes.
setkey(setDT(df), Month) ## reorders `df` by `Month`
df[J(6:8), lapply(.SD, sum), by=Year, .SDcols= -"Month"]
Note that just the first part changed from Month %in% 6:8
to J(6:8)
. The latter performs a binary search based subset on the key column Month
. Everything else is intact.
If you do repeated subsetting or work with very large data sets, keying data.tables and subsetting using binary search would improve performance tremendously. Refer to the vignettes for more info.
data
set.seed(42)
df <- data.frame(Year= rep(1950:1952,each=12), Month=rep(1:12,3),
Model1=rnorm(36), Model2=rnorm(36), Model3=rnorm(36))