Create a stacked bar using a frequency table

Question

Actually I'm working with two frequency tables who have got the name: identified_modification_table and unidentified_modifications_table

The structure of these files are something like that:

identified_modification_table

Modifications   | Frequency
MOD:42123       | 12
MOD:1234        | 7
MOD:7618        | 36
MOD:411232      | 51

unidentified_modifications_table

Modifications   | Frequency
MOD:42123       | 12  
MOD:12          | 20
MOD:7618        | 36
MOD:411232      | 51

I would like to merge these files and create this output in order to create a stacked barplot like this example.

Modifications   | Frequency.1 | Frequency.2 
MOD:42123       | 12          | 12
MOD:1234        | 7           | NA
MOD:12          | NA          | 20
MOD:7618        | 36          | 36
MOD:411232      | 51          | 51

I was trying to use this code to merge tables and add NA where the value doesn't exist.

df_final <- cbind.data.frame(df1, df2[match(df1$modifications, df2$modifications), ]);

But this doesn't work properly and I don't know why.

After this I think I should just use melt and ggplot2 stacked bar:

df_barplot <- melt(df,measure.vars = names(df))

ggplot((df_barplot), aes(x = value, fill = variable)) + 
    geom_bar(stat = "count", position = "dodge") + 
    theme(axis.text.x = element_text(angle = 20, hjust = 0.5, vjust = -0.1)) + 
    guides(fill=FALSE)+
    labs("Barplot") + 
    xlab("Values")+
    ylab("Frequency")+
    theme(text = element_text(size=18), axis.text.x = element_text(angle = 90, hjust = 1, size = 15), axis.text.y=element_text(size = 15))

Does anyone know how I could do this?

Here you are and a reproducible example:

df1 <- data.frame(modifications=c("MOD:214", "MOD:3","MOD:24","MOD:44","MOD:123", "MOD:123", "MOD:212"), Frequency=c(1,41,616,727,828,8993,383))


  df2 <- data.frame(modifications=c("MOD:214", "MOD:3","MOD:24","MOD:445","MOD:12", "MOD:123", "MOD:212"), Frequency=c(1,43,64,77,88,893,38))

Thank you

You could use `data <- merge(df1, df2, by = "modifications", all = T, sort = F)` — Miha, Mar 30 '17 at 13:19
Do you want to keep all levels in the plot if one of the two dataframes has an NA value? I've added an answer assuming that you do, but if you don't you could specify `merge(df1,df2,all = F)` — Niek, Mar 30 '17 at 13:50

score 2 · Answer 1 · answered Mar 30 '17 at 13:19

Here's the tidyverse way:

library(tidyverse)
merged_df <- full_join(df1, df2, by = "modifications")
merged_df <- gather(merged_df, key = Category, value = Frequency, -modifications)

And the chart:

ggplot(merged_df, aes(x = modifications, y = Frequency, fill = Category)) + 
geom_col(position = "dodge")

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

I think this does what you want

df3<-merge(df1,df2, by = "modifications",all = T)

library(reshape2)
df3<- melt(df3)
df3$variable<-factor(df3$variable,labels = c("modifications1","modifications2"))

library(ggplot2)
ggplot(df3, aes(x = modifications, y = value, fill = variable)) + 
  geom_bar(stat = "identity",position = "dodge")

edit: added all = T to keep all frequencies that occur in either table

Create a stacked bar using a frequency table

2 Answers2