I'm running into probably a rare situation where I have values for several groups that I'd like to plot using R
's ggplot2
's geom_violin
+ geom_boxplot
, filling and coloring the violins by group, and coloring the boxes by group as well. Occasionally, one or more of the groups has less than three values, for example:
set.seed(1)
df <- data.frame(group = c(rep("A",100),rep("B",100),rep("C",2),"D"),
value = c(rnorm(100,1,1), rnorm(100,2,1), rnorm(2,3,1), rnorm(1,1,1)))
My ggplot2
code is:
library(ggplot2)
ggplot(df,aes(x=group,y=value)) +
geom_violin(aes(fill=group,color=group),alpha=0.3) +
geom_boxplot(width=0.1,aes(color=group),fill=NA) +
theme_minimal() + ylab("Value") + theme(legend.title=element_blank(),axis.ticks.x=element_blank(),axis.text.x=element_blank(),axis.title.x=element_blank())
Where the undesired behavior is that the legend gets split into two, where I imagine this happens because groups C and D cannot be represented by violins due to insufficient points.
Increasing the number of points of groups C and D to 3 gives the desired behavior with the same code:
set.seed(1)
df <- data.frame(group = c(rep("A",100),rep("B",100),rep("C",3),rep("D",3)),
value = c(rnorm(100,1,1), rnorm(100,2,1), rnorm(3,3,1), rnorm(3,1,1)))
df$group <- factor(df$group, levels = c("A","B","C","D"))
My question is if it is possible to force my ggplot2
code to always give a single legend, like in the second example, even if the number of points of a group is one.
I know that I can artificially inflate such groups by adding pseudo-counts for them, but I'd rather stay faithful to the data in this case.