4

I'm running into probably a rare situation where I have values for several groups that I'd like to plot using R's ggplot2's geom_violin + geom_boxplot, filling and coloring the violins by group, and coloring the boxes by group as well. Occasionally, one or more of the groups has less than three values, for example:

set.seed(1)

df <- data.frame(group = c(rep("A",100),rep("B",100),rep("C",2),"D"),
                 value = c(rnorm(100,1,1), rnorm(100,2,1), rnorm(2,3,1), rnorm(1,1,1)))

My ggplot2 code is:

library(ggplot2)
ggplot(df,aes(x=group,y=value)) + 
  geom_violin(aes(fill=group,color=group),alpha=0.3) +
  geom_boxplot(width=0.1,aes(color=group),fill=NA) +
  theme_minimal() + ylab("Value") + theme(legend.title=element_blank(),axis.ticks.x=element_blank(),axis.text.x=element_blank(),axis.title.x=element_blank())

Where for this example gives: enter image description here

Where the undesired behavior is that the legend gets split into two, where I imagine this happens because groups C and D cannot be represented by violins due to insufficient points.

Increasing the number of points of groups C and D to 3 gives the desired behavior with the same code:

set.seed(1)

df <- data.frame(group = c(rep("A",100),rep("B",100),rep("C",3),rep("D",3)),
                 value = c(rnorm(100,1,1), rnorm(100,2,1), rnorm(3,3,1), rnorm(3,1,1)))

df$group <- factor(df$group, levels = c("A","B","C","D"))

enter image description here

My question is if it is possible to force my ggplot2 code to always give a single legend, like in the second example, even if the number of points of a group is one.

I know that I can artificially inflate such groups by adding pseudo-counts for them, but I'd rather stay faithful to the data in this case.

dan
  • 6,048
  • 10
  • 57
  • 125
  • Add scale fill custom, then hide legend for violin. – zx8754 Sep 02 '20 at 19:17
  • 1
    One option is to set the limits of the fill, like `scale_fill_discrete(limits = unique(df$group) )`. I always think this sort of issue is where `drop = FALSE` would work but it doesn't seem to here. – aosmith Sep 02 '20 at 19:18

1 Answers1

4

You can just specify limits to include all the levels:

library(ggplot2)

ggplot(df, aes(x = group, y = value)) + 
  geom_violin( aes(color = group,  fill = group), alpha = 0.3) +
  geom_boxplot(aes(color = group), fill = NA, width = 0.1) +
  scale_fill_manual(limits = c("A", "B", "C", "D"),
                    values = scales::hue_pal()(4),
                    drop   = FALSE) +
  ylab("Value") + 
  theme_minimal() + 
  theme(legend.title = element_blank(),
        axis.ticks.x = element_blank(),
        axis.text.x  = element_blank(),
        axis.title.x = element_blank())

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thanks a lot @Allan Cameron. Wouldn't it be safer to add: `scale_fill_manual(limits = c("A", "B", "C", "D"), values = scales::hue_pal()(4), drop = FALSE)`? – dan Sep 04 '20 at 18:13
  • @dan isn't that what it has already? Do you mean `scale_color_manual`? – Allan Cameron Sep 04 '20 at 19:02
  • Yep, sorry for the mistake and thanks for correcting it – dan Sep 04 '20 at 22:56