0

When I use the geom_col to graph the percent of people who are white in the state (from the midwest dataset in the ggplot2 package), ggplot2 adds the values instead of averages them. This seems like a very strange default to me - that's not what a bar/column graph 'does' in my opinion. I read the help documentation and did some Googling but maybe I'm not searching for the right things.

ggplot(data = midwest, mapping = aes(x = state, y = percwhite)) +
  geom_col()

This graph clearly returns the sum of all the values for each state. I want it to return the average for each state. I'm only a few weeks into using R but I can't believe I've never noticed this before.

  • I also tried to replace geom_col with geom_bar(stat = "identity") and that didn't work either. Replacing the whole geom with stat_summary(fun = "mean", geom = "bar") works but I figure there must be a way to do this with geoms. – Hannah Harder Sep 21 '20 at 20:21
  • Although the distinction is not that clear in ggplot2, geoms are better seen as plotting the data unchanged and stats as transforming data before passing them to a geom. Things can be at first confusing as some geoms, like `geom_bar()` and `geom_smooth()` by default use a stat that modifies the data before plotting them. – Pedro J. Aphalo Sep 21 '20 at 20:32
  • So would using geom_bar(stat = "summary", fun = "mean") be okay? – Hannah Harder Sep 21 '20 at 20:33
  • 1
    It is often easier to pre-calculate what you want to plot - if you use dplyr, you could group_by state and then summarise to find the mean and plot that with geom_col – Richard Telford Sep 21 '20 at 20:53

2 Answers2

1

The code in the question produces "sums" because in geom_col() the default is position = "stack".

Here are different possible approaches to producing a figure showing means:

library(ggplot2)

# the normal way of plotting data summaries like means is to use stat_summary()
ggplot(data = midwest, mapping = aes(x = state, y = percwhite)) +
  stat_summary(geom = "col", fun = mean)

# same plot using less intuitive code (avoid if possible)
ggplot(data = midwest, mapping = aes(x = state, y = percwhite)) +
  geom_bar(stat = "summary", fun = mean)

# same plot using base R functions to pre-compute the means
means.df <- aggregate(percwhite ~ state, FUN = mean, data = midwest)

ggplot(data = means.df, mapping = aes(x = state, y = percwhite)) +
  geom_col() # one value per column, stacking has no effect

rm(means.df) # assuming it is no-longer needed

# same plot using pipes and dplyr "verbs"
library(dplyr)
midwest %>%
  group_by(state) %>%
  summarise(percwhite = mean(percwhite)) %>%
  ggplot(mapping = aes(x = state, y = percwhite)) +
  geom_col()

It should be noted that geom_bar() and the newer geom_col() are very similar. However, only geom_bar() has parameters stat and fun defined.

Pedro J. Aphalo
  • 5,796
  • 1
  • 22
  • 23
0

First, create a table of means:

myTable <- aggregate(percwhite ~ state, FUN = mean, data = midwest)

Now you can use the table to make your bar plot:

ggplot(data = myTable, mapping = aes(x = state, y = percwhite)) +  geom_col()
Werner Hertzog
  • 2,002
  • 3
  • 24
  • 36