5

I'm trying to use facet_grid to produce several plots where each plot's percentage labels add to 100%.

In the image provided, the percentages labels add to 49% (first facet) and 51% (second facet).

I've seen this Question where the solution is to aggregate the data outside ggplot. I'd rather not do that, I believe this is a better approach.

library("ggplot2")
library("scales")

set.seed(123)

df <- data.frame(x = rnorm(10000, mean = 100, sd = 50))

df$factor_variable <- cut(df$x, right = TRUE, 
                          breaks = c(0, 25, 50, 100, 200, 10000),
                          labels = c("0 - 25", "26 - 50", "51 - 100", "101 - 200", "> 200")
                          )

df$second_factor_variable <- ifelse(df$x < 100, 1, 2)

df <- sample(df, x > 0)

table(df$second_factor_variable)

p1 <- ggplot(df, aes(x = factor_variable, y = (..count..)/sum(..count..), ymax = 0.8))
p1 <- p1 + geom_bar(fill = "deepskyblue3", width=.5)
p1 <- p1 + stat_bin(geom = "text",
                    aes(label = paste(round((..count..)/sum(..count..)*100), "%")),
                    vjust = -1, color = "grey30", size = 6)
p1 <- p1 + xlab(NULL) + ylab(NULL)
p1 <- p1 + scale_y_continuous(label = percent_format())
p1 <- p1 + xlim("0 - 25", "26 - 50", "51 - 100", "101 - 200", "> 200")
p1 <- p1 + facet_grid(. ~ second_factor_variable)

print(p1)

Here is the attempt

Claus Wilke
  • 16,992
  • 7
  • 53
  • 104
marbel
  • 7,560
  • 6
  • 49
  • 68

1 Answers1

5

This method for the time being works. However the PANEL variable isn't documented and according to Hadley shouldn't be used. It seems the "correct" way it to aggregate the data and then plotting, there are many examples of this in SO.

ggplot(df, aes(x = factor_variable, y = (..count..)/ sapply(PANEL, FUN=function(x) sum(count[PANEL == x])))) +
                 geom_bar(fill = "deepskyblue3", width=.5) +
                 stat_bin(geom = "text",
                          aes(label = paste(round((..count..)/ sapply(PANEL, FUN=function(x) sum(count[PANEL == x])) * 100), "%")),
                          vjust = -1, color = "grey30", size = 6) +
                 facet_grid(. ~ second_factor_variable)

enter image description here

marbel
  • 7,560
  • 6
  • 49
  • 68
  • 2
    Where is the PANEL variable documented? – jlhoward Dec 16 '13 at 02:09
  • @jlhoward Thanks for asking about the [panel variable](http://stackoverflow.com/questions/20622332/documentation-on-internal-variables-in-ggplot-esp-panel). – marbel Dec 17 '13 at 03:34
  • 1
    This is an interesting approach, but see Hadley Wickham's comments to my question [here](http://www.stackoverflow.com/questions/20622332/). BTW, I have no idea why your response was downrated; it certainly wasn't me. – jlhoward Dec 17 '13 at 06:03
  • @jilhoward Sure, i´ve seen the comment. The code outputs some warnings so it´s clearly not perfect... I was hoping there was a better method than just agreggating before plotting but this doesn´t seem to be posible. – marbel Dec 17 '13 at 21:13
  • I'm curious, since it's been 2+ years and a few ggplot2 iterations, is this still the recommended approach? ..PANEL.. hacks do seem hacky. Has this feature been requested? Stats and facets have always confused me, though. – Mike Dolan Fliss Jul 26 '16 at 23:33
  • It's not recommended to use `PANEL`! This is what the answer is saying. You should aggregate and then pas that to `ggplot`. – marbel Jul 28 '16 at 00:09