0

I noticed strange behaviour of the raincloud plot package in R. Specifically, density curves are sensitive to data values in some (but not all) cases. It seems that the shape of the distribution curve made by geom_flat_violin() is somehow linked to data values (where it shouldn't be), and I can't find how to restore their independence. The only clue I managed to find: the curves are shrunk based on the lowest values in the data, although shrinkage affects the whole panel where those values occur, not just the sub-group containing them.

Below is a reproducible example, and a link to its image output to show what I mean. Just a note in advance: the raincloud package (presented in This paper) is not on CRAN afaik, so I lifted it directly from the authors' github repo. I also tried an alternative source file which reproduces the . Other implementations such as ggrdiges::geom_density_ridges() or {ggdist} didn't have the same level of control on graphic parameters (e.g. smoothing), unless I'm missing something.

Example code:

library(reshape2)
library(ggplot2)
source("https://gist.githubusercontent.com/benmarwick/2a1bb0133ff568cbe28d/raw/fb53bd97121f7f9ce947837ef1a4c65a73bffb3f/geom_flat_violin.R")

# load data and melt into longform
data(iris)
miris <- melt(iris,id.vars = "Species", measure.vars = colnames(iris)[1:4], variable.name = "measurement")

## 1- plotting as is gives horizontally "squashed" curves in two of four panels
ggplot(miris, aes(x = Species, y = value, fill = Species)) +
    geom_flat_violin(position = position_nudge(x = .15, y = 0)) +
    facet_wrap(~measurement) 

## 2- manipulating the group of smallest values seems to fix the relevant panel (but fixing other groups doesn't fix the problem - I tried that) 
airis <- miris
# get indices of data to manipulate
inds <- intersect(which(airis$Species == "setosa"), which(airis$measurement == "Petal.Width"))
# assign larger values
airis$value[inds] <- rnorm(length(inds), 3, 0.5)

ggplot(airis, aes(x = Species, y = value, fill = Species)) +
    geom_flat_violin(position = position_nudge(x = .15, y = 0)) +
    facet_wrap(~measurement) 

## this second plot shows larger distribution curves for all speceis in the "Petal.Width" panel, although values were only changed for "setosa"

here's a side-by-side image of the data plotted as is, and after adjusting the small values, as done by the above code

Does anyone know where the problem might be, or what can be done to fix it?

Many thanks!

MARO
  • 1
  • 1
  • Can you be more specific about what "incorrect" / "unexpected" / "is modified somehow" means to you? Perhaps show a picture comparing what you see and what you expected? – Jon Spring Jun 08 '22 at 23:15
  • The density curves in some panels are flatter than they should be, so I assumed some modification of the density value had occurred. I have edited the question to hopefully clarify what I mean. The code attached includes plots before and after correction of the shape of the curves to show what they should look like. It is not a solution though because it relies on making up values. – MARO Jun 09 '22 at 11:15

0 Answers0