1

I have a data table with two categorical variables and one numeric variable.

Here's code to generate the sample data:

data <- data.frame(system = rep(c("X","Y","Z"), 10), 
                   region = rep(letters[1:5], 6), 
                   value = rnorm(60, 500, 300)) 

Now I want to plot the system-region mean of value AND overlay the system-mean against the system-region mean.

Here is the code to build the data for plotting and the first plot:

plot_data <- data %>%
  mutate(system = factor(system), region = factor(region)) %>%
  group_by(system, region) %>%
  summarise(avg = mean(value), .groups = "drop") %>%
  left_join(y = data %>% group_by(system) %>% summarise(avg = mean(value), .groups = "drop"), by = "system", suffix = c("", "_all")) %>%
  mutate(point_type = ifelse(avg_all > avg, "above", "in"))

ggplot(plot_data, aes(x = region, y = avg, fill = system)) +
  geom_col(position = "dodge") +
  geom_point(aes(y = avg_all), shape = 21, position = position_dodge(width = 0.9))

example plot

But now, if I want to add a color aesthetic to geom_point, like this:

ggplot(plot_data, aes(x = region, y = avg, fill = system)) +
  geom_col(position = "dodge") +
  geom_point(aes(y = avg_all, color = point_type), shape = 21, position = position_dodge(width = 0.9))

The graph is no longer arranging the points within position.dodge in the same order as the columns. Note in region 'b' the green and blue points/bars are misaligned, in region 'd' the red and green points/bars are misaligned, and in region 'e' the red, green, and blue points/bars are misaligned. I cannot figure out why. The misalignment is not systematic, but I tried position = position.dodge2(reverse = T) and that did not fix the problem.

enter image description here

M--
  • 25,431
  • 8
  • 61
  • 93
ESELIA
  • 132
  • 1
  • 12

2 Answers2

2

The issue is that by adding the color aes you changed the grouping of the data used for geom_point. To fix that you have to use the group aes, to tell ggplot2 that you want the points grouped and dodged by system.

library(ggplot2)

ggplot(plot_data, aes(x = region, y = avg, fill = system)) +
  geom_col(position = "dodge") +
  geom_point(aes(y = avg_all, color = point_type, group = system),
    shape = 21, position = position_dodge(width = 0.9)
  )

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thank you! I selected this one as correct, because its a more complete answer. I agree with @M-- though, setting the group = system in the first aes is cleaner. I always wondered what the group aesthetic was used for! LOL. – ESELIA Feb 28 '23 at 01:14
2

I am a little late, and there's an answer using group already. I'd say using group = system in the first aes so it'd be shared between the geoms makes more sense.

Another option (which would not give us the exact same graph and I think would not be a great solution), would be defining color in the first aes and then override it in the geom_col.

ggplot(plot_data, aes(x = region, y = avg, fill = system, color = point_type)) +
  geom_col(position = "dodge", color = "white", size = 0.1) +
  geom_point(aes(y = avg_all), 
             shape = 21, position = position_dodge(width = 0.9))

M--
  • 25,431
  • 8
  • 61
  • 93