0

I would like to use ggridges to plot a binned ridgeline, with the percentage of each bin labelled to the bins. I have attempted to use geom_text(stat ="bin) to calculate percentages, but the calculation use all the data. I would like to calculate the percentage separately for each species. Below is the code and the output.

iris_mod=rbind(iris, iris[iris$Species=="setosa",])
#This adds more setosa, so the distribution becomes 100,50, and 50.

ggplot(iris_mod,aes(x=Sepal.Length, y=Species, fill=Species)) +
  geom_density_ridges(alpha=0.6, stat="binline", binwidth = .5, draw_baseline = FALSE,boundary = 0)+
  geom_text(
    stat = "bin",
    aes(y = group + 0*stat(count/count),
        label = round(stat(count/sum(count)*100),2)),
    vjust = 0, size = 3, color = "black", binwidth = .5, boundary=0)

enter image description here

As you can see from the setosa labels, its 5, 23, 19, 3 which adds up to 50, while the other two adds up to 25 each. I wanted the setosa labels to be 10, 46, 38 and 6, which should add up to 100, and the other two species to add up to 100 as well.

stefan
  • 90,330
  • 6
  • 25
  • 51
user236321
  • 95
  • 1
  • 5

1 Answers1

1

Using e.g. tapply to compute sum per group and a small custom function you could do:

library(ggplot2)
library(ggridges)

iris_mod <- rbind(iris, iris[iris$Species == "setosa", ])

comp_pct <- function(count, group) {
  label <- count / tapply(count, group, sum)[as.character(group)] * 100
  ifelse(label > 0, round(label, 2), "")
}

ggplot(iris_mod, aes(x = Sepal.Length, y = Species, fill = Species)) +
  geom_density_ridges(alpha = 0.6, stat = "binline", binwidth = .5, draw_baseline = FALSE, boundary = 0) +
  geom_text(
    stat = "bin",
    aes(
      y = after_stat(group),
      label = after_stat(comp_pct(count, group))
    ),
    vjust = 0, size = 3, color = "black", binwidth = .5, boundary = 0
  )

stefan
  • 90,330
  • 6
  • 25
  • 51