2

I have two large data frames with numerous columns of class character and/or factor. I need to compare the frequency of the values they take in the first and the second data frame by overlying the frequency each pair of variables on the same bar plot. I would like to be able to plot either the count or the proportion.

I am able to plot each one separately.

ds1 <- data.frame(var1 = as.character(c("7","10","11","4", "7","10","11","4"))) 
ds2 <- data.frame(var2 = c("4","4","7","7", "7","10","11","4"))
ggplot(ds1, aes(var1)) + geom_bar()
ggplot(ds2, aes(var2)) + geom_bar()

But I am struggling to:

  1. put the two together
  2. add transparency so both pairs of bars are visible
  3. plot proportion instead of count
DGenchev
  • 327
  • 3
  • 12

1 Answers1

4

Here is a way to do it with the bars made semi-transparent and overlayed. I think its maybe a little clearer putting the bars next to each other, and if you prefer that change the line position_identity() to position_dodge():

library(ggplot2)
ds1 <- data.frame(var1 = as.character(c("7","10","11","4", "7","10","11","4"))) 
ds2 <- data.frame(var2 = c("4","4","7","7", "7","10","11","4"))

plot.df <- cbind(ds1, ds2)
plot.df <- reshape2::melt(plot.df, id.vars = NULL)

ggplot(plot.df, aes(value, group=variable, fill=variable)) + 
  geom_bar(position = position_identity(), 
           aes(y = ..prop..), 
           alpha=.6,
           color='black')+
  theme_minimal() + ggtitle("Comparing the Frequency of Categories")

Edit: For the case where your data.frames are different lengths:

ds1$variable <- "ds1"
ds2$variable <- "ds2"

names(ds1) <- names(ds2)

plot.df <- rbind(ds1, ds2)

and then plot from here.

Created on 2018-05-10 by the reprex package (v0.2.0).

gfgm
  • 3,627
  • 14
  • 34
  • Looks just like what I need. However, I can't use `cbind` because the number of rows may differ between the two data frames. Is there a way around this? – DGenchev May 09 '18 at 21:26
  • @DGenchev sure, you can do it in 3 steps: 1.) make a variable in each dataframe that will indicate which data.frame the values are from, 2.) make sure that this key and the values in the two data.frames have the same names (e.g. key, value), 3.) `rbind()` the two columns from the two data.frames together – gfgm May 09 '18 at 21:29
  • @DGenchev I added some code on the end of the answer to illustrate a way around. – gfgm May 09 '18 at 21:33
  • Sorry for replying late. Just got back to this. I really appreciate your solution and the extra help provided. Thank you so much! – DGenchev May 10 '18 at 20:21