5

In a call to geom_violin within ggplot2, you can specify that the area of each violin should be proportional to the number of observations making up that violin by specifying scale="count".

I assume this operates internally by taking some total amount of area (let's call this amount X) and dividing it proportionally among all violins to be plotted. This is what I want, except that this can result in pretty narrow violins if there is substantial enough disparity in N between groups such that some groups have relatively low N. In my case, this just makes the fill color kind of hard to see.

I think this can be largely solved, in my case at least, by simply expanding X a little bit so that the really small violins get just enough area to still be readable. In other words, I want to retain variation in area between violins according to the number of observations but increase the "pool" of total area being divided amongst violins, so that every one gets slightly bigger.

Anyone have any idea how one might accomplish this? There's gotta be a toggle for this. I've tried fussing with arguments to geom_violin such as width, size, violinwidth, and such, but no luck so far...

EDIT: Code for a boring but reproducible "sample" data set that one can experiment with.

y = runif(100, 1, 10)
x = as.factor(rep(c(1,2), times=50))
z = as.factor(c(rep(1, 10), rep(2, 90)))
df=data.frame(x, y, z)
ggplot(df, aes(x=x, y=y, fill=z)) + geom_violin(scale="count")
Bajcz
  • 433
  • 5
  • 20
  • 1
    Please provide a small reproducible example to facilitate testing of potential solutions. – Roland Aug 03 '16 at 15:14
  • 1
    Added something boring but hopefully exemplary enough to be useful. – Bajcz Aug 03 '16 at 17:17
  • @Bajcz Have you found any solution yet? – Marek Židek Jul 11 '17 at 09:30
  • @MarkSeygan No, I haven't, but maybe I will try to poke around in the geom_violin code this week and see what I can figure out. – Bajcz Jul 12 '17 at 13:12
  • I found I can do it through `width` in the `geom_violin`'s parethesis. – Marek Židek Jul 12 '17 at 17:17
  • Can you expand on this? `width` is marked as a computed variable in the help; others on this list, like `count` and `violinwidth` are ignored when I include them in the `geom_violin` call. So, I'm not sure why `width` is not being ignored, since it is not a function argument. Also, it's not clear to me what `width` is doing...for me, values above ~ 1.5 not only change the shape of my violins, but also their position and orientation on the graph. Any ideas what's going on there? – Bajcz Jul 13 '17 at 18:08

1 Answers1

1

You can do this by adjusting width parameter inside geom_violin. But make sure to also use position_dodge to avoid overlapping plots.

Using your data

ggplot(df, aes(x=x, y=y, fill=z)) + geom_violin(scale="count", width=2)

will give the following plot enter image description here

allowing some gap between the plots by using position_dodge

ggplot(df, aes(x=x, y=y, fill=z)) + geom_violin(scale="count", width=2, position=position_dodge(width=0.5))

This will give you the following non-overlapping plot enter image description here

rm167
  • 1,185
  • 2
  • 10
  • 26
  • Great answer, thanks! Can you explain, if you can, why `width` is a computed variable that the model accepts as an argument when other such variables like `count` and `violinwidth` are ignored? I think that's why I was assuming `width` couldn't be the right thing to be manipulating... – Bajcz Sep 06 '17 at 14:26