-2

The data I have looks as follows:

Week_ID  County  State   date     ZCTA T_mean_F   Precipitation holiday Units
    523 Carroll  Iowa 01/01/2010 51401 5.669194             0       1     0
    523 Carroll  Iowa 01/01/2010 51430 5.757368             0       1     0
    523 Carroll  Iowa 01/01/2010 51436 5.355239             0       1     0
    523 Carroll  Iowa 01/01/2010 51440 6.055060             0       1     0
    523 Carroll  Iowa 01/01/2010 51443 5.806877             0       1     0
    523 Carroll  Iowa 01/01/2010 51444 5.995150             0       1     0
    523 Carroll  Iowa 01/01/2010 51451 5.003030             0       1     0
    523 Carroll  Iowa 01/01/2010 51455 6.342612             0       1     0
    523 Carroll  Iowa 01/01/2010 51459 5.500786             0       1     0
    523 Carroll  Iowa 01/01/2010 51463 6.303967             0       1     0

This is just the first 10 rows. The entire data set has a number of different Week_IDs and ZCTAs.

What I would like to do is to take the mean of "T_mean_F" & "Precipitation" and the sum of "Units", by ZCTA & Week_ID , preferably in one call. The end result would look something like this (just an example, not actual output):

Week_ID      ZCTA  T_mean_avg   Prep_avg  Units
  523       51401   5.669194       2        10
  524       51401   5.757368       3        12
  525       51401   5.355239       7        14

This is what I have tried:

Rollup = Wthr_UMW_dwu[,.(T_mean_avg = mean(T_mean_F),Prep_avg = mean(Precipitaton), Units=sum(Units)), by=.(ZCTA,Week_ID)]

and

Rollup_1<- aggregate(cbind(T_mean_F,Precipitation,Units) ~ ZCTA + Week_ID, data=Wthr_UMW_dwu, FUN = function(x) c(mn=mean(x), MN=mean(x), n = sum(x)))

both of which I modeled off of previous questions on this topic and both of which produce errors.

Anyone know a smooth/elegant way to pull this off?

Thanks, -Keith

  • 1
    Did you loaded the `library(data.table)` as the first one looks like it should work. Also convert to data.table i.e. `setDT(Wthr_UMW_dwu)[, .(T_mean_avg = mean(T_mean_F),Prep_avg = mean(Precipitation), Units=sum(Units)), by=.(ZCTA,Week_ID)]` Check the spelling of Precipitation. – akrun Nov 01 '16 at 13:02
  • Provide your data as a reproducible code, not the print output of it. Also use `data.table` tag if you are using it. – jangorecki Nov 01 '16 at 13:03
  • it seems that you don't want to rollup but just aggregate, as `by` grouping is single character vector to group by (single grouping grain). Rollup is meant to aggregate on various grouping grains, usually levels in a hierarchy, i.e. `list("ZCTA", c("ZCTA","Week_ID"))`. If you would add expected output, it would be more clear for readers what you want to achieve. – jangorecki Nov 01 '16 at 13:08
  • Would group_by and summarise in the dplyr package not work? `Wthr_UMW_dwu%>%group_by(ZCTA,Week_ID)%>%summarise(mean(T_mean_F),Prep_avg=mean(Precipitation)...` – Morgan Ball Nov 01 '16 at 13:10
  • Using just the original code I get the error: `Error in `[.data.frame`(Wthr_UMW_dwu, , .(T_mean_avg = mean(T_mean_F), : unused argument (by = .(ZCTA, Week_ID))` – Keith Siopes Nov 01 '16 at 13:11
  • 2
    Because you are using `data.table` methods on a `data.frame`. Convert it to `data.table` and it should work – akrun Nov 01 '16 at 13:12
  • Thank you akrun. It has been solved :))) – Keith Siopes Nov 01 '16 at 13:21

1 Answers1

1
library(data.table)
setDT(x)
x[, .(
  avg.T_mean_F = mean(T_mean_F),
  avg.P = mean(Precipitation),
  s.Units = sum(Units)
), by = .(ZCTA, Week_ID)]
jangorecki
  • 16,384
  • 4
  • 79
  • 160
Henk
  • 3,634
  • 5
  • 28
  • 54