Group by range and calculate mean in R and graph

Question

I have a table input of data on cars. Roughly 73 observations with 11 variables. I was tasked to find the mean of MPG by gear_ratio. This is the output I receive after completing an aggregate function.

dfdata=data.frame(cars10)
aggregate(x= dfdata$mpg, by=list(dfdata$gear_ratio), FUN=mean)

Gear_ratio	Mean_MPG
2.19	14
2.24	21
2.2.26	25
2.28	17
2.2.47	21
2.53	17
2.56	17.5
2.73	21.3
2.75	20.12
2.87	34
3.05	23.6
3.15	17
3.5	23
3.87	18
3.9	30.3

Next I would like to group and graph the following means by a range of Gear_ratio. The ranges need to be a) 2.0-2.5 b) 2.5 to 3.0 c)3.0-3.5 and d)3.5-4.0. I'd like to change colors for each group as well.

*Wasn't sure if there was a way to group by range in the initial aggregate function I created to find the mean in the first place.

Does this help: https://stackoverflow.com/questions/32389472/how-to-do-range-grouping-on-a-column-using-dplyr — cazman, Sep 15 '21 at 19:10
Please provide example data in a fomrat that can be readily copied, pasted and worked with. The posted ```Gear_ratio``` doesn't appear to be numeric. — rjen, Sep 15 '21 at 19:11
What sort of graph do you want to produce? And could you clarify what you want plotted? — Mark Davies, Sep 15 '21 at 19:15

Mark Davies · Accepted Answer · 2021-09-15T19:30:33.203

This is what I think the OP is asking for. I'm using mtcars data for demonstration and using the wt variable instead of gears as it has values similar to your gear_ratio:

library(dplyr) # for select and case_when and group_by
data = mtcars%>%
  select(wt, mpg)%>%
  mutate(wt_group =  case_when(wt >4 ~ ">4",  # form the groups based on ranges
                              wt >=3.5 ~ "3.5-4", # case_when will read this in order, so no need to do conditional statements such as wt >=3.0 but <3.5
                              wt >=3.0 ~ "3.0-3.5",
                              wt >=2.5 ~ "2.5-3.0",
                              wt >=2.0 ~ "2.0-2.5",
                              wt < 2.0 ~ "<2.0"))%>%
  group_by(wt_group)%>%
  summarise(mean_mpg = mean(mpg) )
> head(data)
# A tibble: 6 x 2
  wt_group mean_mpg
  <chr>       <dbl>
1 <2.0         30.5
2 >4           13.0
3 2.0-2.5      25.7
4 2.5-3.0      20.8
5 3.0-3.5      19.3
6 3.5-4        15.7

Then for the plot :

library(ggplot2)
plot = ggplot(data, aes(x= wt_group, y = mean_mpg , fill = wt_group))+
  geom_col()
plot

Please edit the question and I can update answer if this doesn't give you what you need.

thank you so much! This worked just as I wanted. I had to modify since my dataset did not contain a gear_ratio < 2 or >4 . Thanks again! — btk666, Sep 15 '21 at 21:17

score 0 · Answer 2 · answered Sep 15 '21 at 19:35

library(dplyr)

df %>% 
  mutate(gear_group = cut(Gear_ratio,seq(0,4,.5)))

# A tibble: 15 x 3
   Gear_ratio Mean_MPG gear_group
        <dbl>    <dbl> <fct>     
 1       2.19     14   (2,2.5]   
 2       2.24     21   (2,2.5]   
 3       2.26     25   (2,2.5]   
 4       2.28     17   (2,2.5]   
 5       2.47     21   (2,2.5]   
 6       2.53     17   (2.5,3]   
 7       2.56     17.5 (2.5,3]   
 8       2.73     21.3 (2.5,3]   
 9       2.75     20.1 (2.5,3]   
10       2.87     34   (2.5,3]   
11       3.05     23.6 (3,3.5]   
12       3.15     17   (3,3.5]   
13       3.5      23   (3,3.5]   
14       3.87     18   (3.5,4]   
15       3.9      30.3 (3.5,4]

Group by range and calculate mean in R and graph

2 Answers2