1

Actually I'm working with two frequency tables who have got the name: identified_modification_table and unidentified_modifications_table

The structure of these files are something like that:

identified_modification_table

Modifications   | Frequency
MOD:42123       | 12
MOD:1234        | 7
MOD:7618        | 36
MOD:411232      | 51

unidentified_modifications_table

Modifications   | Frequency
MOD:42123       | 12  
MOD:12          | 20
MOD:7618        | 36
MOD:411232      | 51

I would like to merge these files and create this output in order to create a stacked barplot like this example.

Modifications   | Frequency.1 | Frequency.2 
MOD:42123       | 12          | 12
MOD:1234        | 7           | NA
MOD:12          | NA          | 20
MOD:7618        | 36          | 36
MOD:411232      | 51          | 51

enter image description here

I was trying to use this code to merge tables and add NA where the value doesn't exist.

df_final <- cbind.data.frame(df1, df2[match(df1$modifications, df2$modifications), ]);

But this doesn't work properly and I don't know why.

After this I think I should just use melt and ggplot2 stacked bar:

df_barplot <- melt(df,measure.vars = names(df))

ggplot((df_barplot), aes(x = value, fill = variable)) + 
    geom_bar(stat = "count", position = "dodge") + 
    theme(axis.text.x = element_text(angle = 20, hjust = 0.5, vjust = -0.1)) + 
    guides(fill=FALSE)+
    labs("Barplot") + 
    xlab("Values")+
    ylab("Frequency")+
    theme(text = element_text(size=18), axis.text.x = element_text(angle = 90, hjust = 1, size = 15), axis.text.y=element_text(size = 15))

Does anyone know how I could do this?

Here you are and a reproducible example:

df1 <- data.frame(modifications=c("MOD:214", "MOD:3","MOD:24","MOD:44","MOD:123", "MOD:123", "MOD:212"), Frequency=c(1,41,616,727,828,8993,383))


  df2 <- data.frame(modifications=c("MOD:214", "MOD:3","MOD:24","MOD:445","MOD:12", "MOD:123", "MOD:212"), Frequency=c(1,43,64,77,88,893,38))

Thank you

Enrique
  • 842
  • 1
  • 9
  • 21
  • You could use `data <- merge(df1, df2, by = "modifications", all = T, sort = F)` – Miha Mar 30 '17 at 13:19
  • 1
    Do you want to keep all levels in the plot if one of the two dataframes has an NA value? I've added an answer assuming that you do, but if you don't you could specify `merge(df1,df2,all = F)` – Niek Mar 30 '17 at 13:50

2 Answers2

2

Here's the tidyverse way:

library(tidyverse)
merged_df <- full_join(df1, df2, by = "modifications")
merged_df <- gather(merged_df, key = Category, value = Frequency, -modifications)

And the chart:

ggplot(merged_df, aes(x = modifications, y = Frequency, fill = Category)) + 
geom_col(position = "dodge")

enter image description here

Phil
  • 7,287
  • 3
  • 36
  • 66
2

I think this does what you want

df3<-merge(df1,df2, by = "modifications",all = T)

library(reshape2)
df3<- melt(df3)
df3$variable<-factor(df3$variable,labels = c("modifications1","modifications2"))

library(ggplot2)
ggplot(df3, aes(x = modifications, y = value, fill = variable)) + 
  geom_bar(stat = "identity",position = "dodge")

edit: added all = T to keep all frequencies that occur in either table

enter image description here

Community
  • 1
  • 1
Niek
  • 1,594
  • 10
  • 20