I want to calculate grouped means of multiple columns in a dataframe. In the process, I will want to retain non-numeric columns that don't vary across with the grouping variable. Here's a simple example.
library(dplyr)
#create data frame
df <- data.frame(team=c('A', 'A', 'B', 'B', 'B', 'C', 'C'),
state=c('Michigan', 'Michigan', 'Michigan', 'Michigan', 'Michigan','AL', 'AL'),
region=c('Midwest', 'Midwest', 'Midwest', 'Midwest', 'Midwest', 'South', 'South'),
pts=c(5, 8, 14, 18, 5, 7, 7),
rebs=c(8, 8, 9, 3, 8, 7, 4),
ast=c(8,6,7,5,3,0,9))
The resulting data field:
> df
team state region pts rebs ast
1 A Michigan Midwest 5 8 8
2 A Michigan Midwest 8 8 6
3 B Michigan Midwest 14 9 7
4 B Michigan Midwest 18 3 5
5 B Michigan Midwest 5 8 3
6 C Alabama South 7 7 0
7 C Alabama South 7 4 9
Summarizing by mean with 'team' as the grouping variable is straightforward enough:
> df %>%
+ group_by(team) %>%
+ summarise_at(vars(pts, rebs, ast), list(mean))
# A tibble: 3 × 4
team pts rebs ast
<chr> <dbl> <dbl> <dbl>
1 A 6.5 8 7
2 B 12.3 6.67 5
3 C 7 5.5 4.5
But how do I retain those other ID columns (that don't change across within-team stats). In other words, how do I get the following:
team state region pts rebs ast
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 A Michigan Midwest 6.5 8 7
2 B Michigan Midwest 12.3 6.67 5
3 C Alabama South 7 5.5 4.5
Thanks!!