2

I'm a little embarrassed to ask this question but I've spent the better part of my work day trying to find a solution, yet and here I am...

What I'm aiming for is a simple ridgeline plot of several normal distributions which are calculated from given means and SDs in my data, like in this example:

case_number    caseMean    caseSD
case1          0           1
case2          1           2
case3          3           3

All the examples I've found are working with series of measurement, like in the example with the temperatures in Lincoln, NE: Example of ridgeline plot https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html and I cannot get them to work.

As to my experience with R, I am not a complete idiot when it comes to data analysis but proper visualization is something I am eager to learn but unfortunately I need a solution to my problem rather.

Thank you very much for your help!

Marcus Campbell
  • 2,746
  • 4
  • 22
  • 36
Klaus Peter
  • 125
  • 7

2 Answers2

3

Edit -- added precise theoretical answer.

Here's a way using dnorm to construct exact normal curves to those specifications:

library(tidyverse); library(ggridges)
n = 100
df3 <- df %>%
  mutate(low  = caseMean - 3 * caseSD, high = caseMean + 3 * caseSD) %>%
  uncount(n, .id = "row") %>%
  mutate(x    = (1 - row/n) * low + row/n * high, 
         norm = dnorm(x, caseMean, caseSD))
ggplot(df3, aes(x, case_number, height = norm)) +
  geom_ridgeline(scale = 3)

enter image description here


Similar to Sada93's answer, using dplyr and tidyr:

library(tidyverse); library(ggridges)
n = 50000
df2 <- df %>% 
  uncount(n) %>%
  mutate(value = rnorm(n(), caseMean, caseSD))
ggplot(df2, aes(x = value, y = case_number)) + geom_density_ridges()

enter image description here

sample data:

df <- read.table(
  header = T, 
  stringsAsFactors = F,
  text = "case_number    caseMean    caseSD
case1          0           1
case2          1           2
case3          3           3")
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
1

You need to create a new data frame with the actual distribution values and then use ggridges as follows,

library(ggplot2)
library(ggridges)

data = data.frame(case = c("case1","case2","case3"),caseMean = c(0,1,3),caseSD = c(1,2,3))

#Create 100 rows for each mean and SD

data_plot = data.frame(case = character(),value = numeric())
n = 100
for(i in 1:nrow(data)){

  case = data$case[i]
  mean = data$caseMean[i]
  sd = data$caseSD[i]

  val = rnorm(n,mean,sd)

  data_plot = rbind(data_plot,
                    data.frame(case = rep(case,n),
                               value = val))
}

ggplot(data = data_plot,aes(x = value,y = case))+geom_density_ridges()

Sada93
  • 2,785
  • 1
  • 10
  • 21
  • Thank you very much for this suggestion! For my needs right now this is sufficient, although I have to increase the number of data points for each case that it mimics a normal distribution, which takes some time on my old machine. Perhaps somebody has an additional suggestion, that's why I am leaving this open for now. Again, thank you very much! – Klaus Peter Jun 04 '19 at 15:30