Applying Functions to several variables, by a number variables

Question

The data I have looks as follows:

Week_ID  County  State   date     ZCTA T_mean_F   Precipitation holiday Units
    523 Carroll  Iowa 01/01/2010 51401 5.669194             0       1     0
    523 Carroll  Iowa 01/01/2010 51430 5.757368             0       1     0
    523 Carroll  Iowa 01/01/2010 51436 5.355239             0       1     0
    523 Carroll  Iowa 01/01/2010 51440 6.055060             0       1     0
    523 Carroll  Iowa 01/01/2010 51443 5.806877             0       1     0
    523 Carroll  Iowa 01/01/2010 51444 5.995150             0       1     0
    523 Carroll  Iowa 01/01/2010 51451 5.003030             0       1     0
    523 Carroll  Iowa 01/01/2010 51455 6.342612             0       1     0
    523 Carroll  Iowa 01/01/2010 51459 5.500786             0       1     0
    523 Carroll  Iowa 01/01/2010 51463 6.303967             0       1     0

This is just the first 10 rows. The entire data set has a number of different Week_IDs and ZCTAs.

What I would like to do is to take the mean of "T_mean_F" & "Precipitation" and the sum of "Units", by ZCTA & Week_ID , preferably in one call. The end result would look something like this (just an example, not actual output):

Week_ID      ZCTA  T_mean_avg   Prep_avg  Units
  523       51401   5.669194       2        10
  524       51401   5.757368       3        12
  525       51401   5.355239       7        14

This is what I have tried:

Rollup = Wthr_UMW_dwu[,.(T_mean_avg = mean(T_mean_F),Prep_avg = mean(Precipitaton), Units=sum(Units)), by=.(ZCTA,Week_ID)]

and

Rollup_1<- aggregate(cbind(T_mean_F,Precipitation,Units) ~ ZCTA + Week_ID, data=Wthr_UMW_dwu, FUN = function(x) c(mn=mean(x), MN=mean(x), n = sum(x)))

both of which I modeled off of previous questions on this topic and both of which produce errors.

Anyone know a smooth/elegant way to pull this off?

Thanks, -Keith

Did you loaded the `library(data.table)` as the first one looks like it should work. Also convert to data.table i.e. `setDT(Wthr_UMW_dwu)[, .(T_mean_avg = mean(T_mean_F),Prep_avg = mean(Precipitation), Units=sum(Units)), by=.(ZCTA,Week_ID)]` Check the spelling of Precipitation. — akrun, Nov 01 '16 at 13:02
Provide your data as a reproducible code, not the print output of it. Also use `data.table` tag if you are using it. — jangorecki, Nov 01 '16 at 13:03
it seems that you don't want to rollup but just aggregate, as `by` grouping is single character vector to group by (single grouping grain). Rollup is meant to aggregate on various grouping grains, usually levels in a hierarchy, i.e. `list("ZCTA", c("ZCTA","Week_ID"))`. If you would add expected output, it would be more clear for readers what you want to achieve. — jangorecki, Nov 01 '16 at 13:08
Would group_by and summarise in the dplyr package not work? `Wthr_UMW_dwu%>%group_by(ZCTA,Week_ID)%>%summarise(mean(T_mean_F),Prep_avg=mean(Precipitation)...` — Morgan Ball, Nov 01 '16 at 13:10
Using just the original code I get the error: `Error in `[.data.frame`(Wthr_UMW_dwu, , .(T_mean_avg = mean(T_mean_F), : unused argument (by = .(ZCTA, Week_ID))` — Keith Siopes, Nov 01 '16 at 13:11
Because you are using `data.table` methods on a `data.frame`. Convert it to `data.table` and it should work — akrun, Nov 01 '16 at 13:12

score 1 · Answer 1 · edited Nov 01 '16 at 13:20

1

library(data.table)
setDT(x)
x[, .(
  avg.T_mean_F = mean(T_mean_F),
  avg.P = mean(Precipitation),
  s.Units = sum(Units)
), by = .(ZCTA, Week_ID)]

edited Nov 01 '16 at 13:20

jangorecki

16,384
4
79
160

answered Nov 01 '16 at 13:10

Henk

3,634
5
28
54

Applying Functions to several variables, by a number variables

1 Answers1