-1

How would I plot a bar chart showing the percentage within gender for each of the different levels of var

The data can be built as follows:

structure(list(var = structure(c(5L, 5L, 5L, 6L, 5L, 4L, 5L, 
6L, 6L, 6L, 5L, 5L, 5L, 6L, 6L, 5L, 6L, 5L, 6L, 5L), .Label = c("-97:\nMultiple\nResponse", 
"-99:\nRefused", "1:\nDefinitely", "2:\nProbably", "3:\nProbably\nnot", 
"4:\nDefinitely\nnot"), class = "factor"), GENDER = structure(c(1L, 
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 
1L, 2L, 1L), .Label = c("1: Male", "2: Female", "3: Unknown"), class = "factor")), .Names = c("var", 
"GENDER"), row.names = c(NA, 20L), class = "data.frame")

I want the bars within gender to each add up to 100%

vashts85
  • 1,069
  • 3
  • 14
  • 28

1 Answers1

1

Summarise the data to get percent by GENDER within each level of var. Below, I use dplyr to do that on the fly within the call the ggplot. I've called your data frame dat:

library(dplyr)
library(scales)

ggplot(dat %>% group_by(var, GENDER) %>%
         tally %>%
         mutate(pct=n/sum(n)), aes(var, pct, fill=GENDER)) +
  geom_bar(stat="identity") +
  scale_y_continuous(labels=percent_format())

enter image description here

UPDATE: To ensure empty categories are included:

ggplot(dat %>% group_by(var, GENDER) %>%
         tally %>%
         mutate(pct=n/sum(n))) +
  geom_bar(stat="identity", aes(var, pct, fill=GENDER)) +
  scale_y_continuous(labels=percent_format()) +
  scale_x_discrete(drop=FALSE)
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • This doesn't work quite well. The first level is missing and each of the response categories is adding up to 100%. – vashts85 May 03 '16 at 18:03
  • Some of the levels are missing because they are not present in the sample of data you provided. Regarding the bars summing to 100%, it seemed like your question was asking for the bars to sum to 100% for each value of var. If I misinterpreted, please explain how you want the data grouped. – eipi10 May 03 '16 at 18:06
  • Actually your workflow works really well, it just took me a minute. How would I change the order a bit so that all values of `var` are shown? Here is what I tried, but it omits some of the values of `var` that are not present: ggplot(dat %>% group_by(GENDER, var) %>% tally %>% mutate(pct=n/sum(n)), aes(GENDER, pct, fill=var)) + geom_bar(stat="identity", position="dodge") + scale_y_continuous(labels=percent_format())+ scale_x_discrete(drop=FALSE) – vashts85 May 03 '16 at 18:18