0

I am trying to plot data that is grouped by one variable into boxplots using ggplot2, then I want to distinguish each data point plotted by their replicate using different symbols.

My data:

Cell_line<- rep(c("B1", "B2","C"), each=4)
Condition<- rep(c("1", "2", "3", "4"), times=3)
rep1 <- c(250.1202269,  NA, 87.78025978,    103.7252853,    131.3835253,    NA, 168.8831935,    135.5137408,    137.9377942, NA, 48.73955206,   73.48705161)
rep2<- c(176.5811282,   165.4414077,    58.18896416,    52.48947013,    214.1871341,    200.8850097,    312.473565, 194.1484832,    221.5290924,    208.2391158,    107.2347819,    81.38548616)
rep3 <- c(125.0917574,  71.3834596, 40.42846894, 22.41081706,   128.4170654,    114.8438056,    150.7904802,    112.1023294,    99.56769695,    135.9090866,    93.05268714,    39.17564189)


df <- data.frame(Cell_line, Condition, rep1, rep2, rep3)

I can plot it fine without the different symbols using geom_beeswarm to add the points:

df %>% 
  pivot_longer(cols = rep1:rep3, names_to = "replicate", values_to = "expression") %>% 
  mutate(Condition = fct_relevel(Condition, 
                                 "1", "2", "3", "4")) %>% 
  ggplot(aes(x=Condition, y=expression, colour = Cell_line))+
  geom_boxplot()+
  geom_beeswarm(dodge.width=0.75, size=2.5)

(https://i.stack.imgur.com/SvnWv.png)

Everything is fine until I try to change the symbols, using geom_point, where the points are scattered instead of lining up along the centre of their respective boxplot.

df %>% 
  pivot_longer(cols = rep1:rep3, names_to = "replicate", values_to = "expression") %>% 
  mutate(Condition = fct_relevel(Condition, 
                                   "1", "2", "3", "4")) %>% 
  ggplot(aes(x=Condition, y=expression, colour = Cell_line))+
  geom_boxplot()+
  geom_point(aes(colour=Cell_line, shape = replicate), position=position_dodge(width=1), size=3)+
  scale_shape_manual(values=c(15, 16, 17))

(https://i.stack.imgur.com/QAkrS.png)

How can I fix this so it appears like the first plot except with different symbols?

stefan
  • 90,330
  • 6
  • 25
  • 51
Ema
  • 3
  • 1

1 Answers1

2

The issue is that adding the shape aes changes the grouping of your data and hence the dodging. To fix that you have to explicitly set the group aes. Additionally I have set position_dodge(width = .75) to align the points with the boxplots.

library(ggplot2)
library(tidyr)
library(dplyr, warn = FALSE)
library(forcats)

df %>%
  pivot_longer(cols = rep1:rep3, names_to = "replicate", values_to = "expression") %>%
  mutate(Condition = fct_relevel(
    Condition,
    "1", "2", "3", "4"
  )) %>%
  ggplot(aes(x = Condition, y = expression, colour = Cell_line)) +
  geom_boxplot() +
  geom_point(aes(colour = Cell_line, shape = replicate, group = Cell_line),
    position = position_dodge(width = .75), size = 3
  ) +
  scale_shape_manual(values = c(15, 16, 17))
#> Warning: Removed 3 rows containing non-finite values (`stat_boxplot()`).
#> Warning: Removed 3 rows containing missing values (`geom_point()`).

stefan
  • 90,330
  • 6
  • 25
  • 51