I have a data frame in which for each grouping variable, there are two types of variables: one set for which I need the mean within each group, the other one for which I need the sum within each group. That is, I want to apply two different summary functions to two different sets of variables in a data frame after applying some chain functions (such as filter and select, because the original problem is more complicated than this).
> head(df, 10)
group.var x1 x2 x3 y1 y2 y3
1 1 460 477 236 65 142 384
2 1 88 336 114 93 378 52
3 1 93 290 353 384 498 43
4 1 394 105 306 172 216 267
5 1 402 145 423 425 125 322
6 2 187 473 466 279 81 484
7 2 465 373 50 422 136 78
8 2 404 455 362 205 315 12
9 2 54 202 242 348 324 275
10 2 340 380 14 442 376 491
Ideally I want to use dplyr
's summarize_at
function twice in the same chain to apply mean
to variable set 1 and sum
to set 2 in two different operations, but for obvious reason, the returned grouped df cannot identify the second set of varibales.
> df1 <- df %>%
+ select(group.var, x1:xn, y1:yn) %>% # just for reference
+ filter(x2 != 20) %>% # just for reference
+ group_by(group.var) %>%
+ summarize_at(vars(x1:xn), mean) %>%
+ summarize_at(vars(y1:ym), sum)
Error in is_character(x, encoding = encoding, n = 1L) :
object 'y1' not found
I can write two snippets which do the same grouping, selecting and filtering, but different summarizing using the summarize_all
function, and then join the grouped df's using group.var
, but I'm looking for a more efficient method.
The end result I want is:
group.var x1 x2 x3 y1 y2 y3
1 1 287.4 270.6 286.4 1139 1359 1068
2 2 290.0 376.6 226.8 1696 1232 1340
Any suggestions, preferably using dplyr
or data.table
?