0

I have a data frame containing several binary variables. I want to calculate the percentage for each column and then plot those values.

My data frame looks something like this:

name   bin1  bin2
a       1     0
b       0     1
c       0     1

I want to calculate the percentage for bin1, bin2 and then only plot the percentage of 1's (or yeses) for bin1 and bin2 in a bar graph.

I have an extremely clunky code, and I want something that will iterate through to calculate the percentage for each variable. I basically manually calculated the percentages and plotted them:

n <- length(df$bin1)
plot(c(sum(df$bin1)/n, sum(df$bin2)/n,main="My Title", ylab="Percentage",type = "h",width=5)

This leads to a pretty ugly graph and I just generally don't like how clunky it is.

Please let me know if this is unclear. Thanks!

1 Answers1

0

Since you have only 1/0 values we can take mean of columns to get the percentage of 1's. Use barplot to plot it.

barplot(colMeans(df[-1]) * 100, ylim = c(0, 100), ylab='Percentage',
         xlab = 'bins', main = 'Percentage of yes')

enter image description here data

df <- structure(list(name = c("a", "b", "c"), bin1 = c(1L, 0L, 0L), 
    bin2 = c(0L, 1L, 1L)), class = "data.frame", row.names = c(NA, -3L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you! That worked, I didn't realize I could just use colMeans since it's binary. Just had to use na.rm = TRUE since I had some NA values. – drama llama Sep 02 '20 at 15:54