R: Melting an data frame and plotting by group

Question

I have a large dataset that I'd like to use to plot genetic divergence along chromosomes. The data frame I am using has the following format.

ID      Group   100     270     310     430     460     550     580     660     710     740
Strain1 A       0.191   0.147   0.124   0.149   0.193   0.189   0.123   0.189   0.151   0.180
Strain2 A       0.188   0.188   0.149   0.136   0.000   0.199   0.199   0.188   0.149   0.000
Strain3 B       0.123   0.147   0.190   0.061   0.148   0.149   0.148   0.197   0.178   0.172
Strain4 B       0.147   0.197   0.188   0.178   0.179   0.149   0.191   0.154   0.179   0.187

I'd like to use ggplot2 to plot a line for each strain, with the lines colored according to group affiliation, and a continuous x-axis running from chromosome positions 100 through 740. I cannot figure out how to melt the data without extracting the group info first and then adding it back after melting. Can anyone suggest a one-step approach to accomplish this?

Not clear about thje expected. May be `library(tidyr);gather(df1, key, val, 3:ncol(df1))` — akrun, Mar 20 '19 at 04:53

score 1 · Answer 1 · answered Mar 20 '19 at 04:59

1

We could gather into 'long' format and then plot with ggplot

library(ggplot2)
library(dplyr)
library(tidyr)
gather(df1, key, val, 3:ncol(df1)) %>% 
   mutate(key = as.numeric(key)) %>%
   ggplot() + 
     geom_line(aes(x = key, y = val, group = Group, color = Group))

answered Mar 20 '19 at 04:59

akrun

874,273
37
540
662

2

I think maybe the grouping variable needs to be `ID` so you can get a line per strain and the colour needs to be `Group`. i.e. `geom_line(aes(x = key, y = val, group = "ID", color = Group))`. That should give three lines and 2 colours. – Croote Mar 20 '19 at 05:13

score 1 · Answer 2 · answered Mar 20 '19 at 05:08

I think this will work best if you colour by Group and facet on Strain. Assuming dataframe is named mydata:

library(tidyr)
library(ggplot2)

mydata %>% 
  gather(Var, Val, -Group, -ID) %>% 
  ggplot(aes(Var, Val)) + 
  geom_line(aes(color = Group, group = Group)) + 
  facet_grid(ID ~ .)

DrDNA · Answer 3 · 2019-03-21T04:02:28.397

The answer by akrun is almost there, except there should be one line plotted for each strain. For more information, here's a link to a screen shot (sorry, need more rep for posting actual image) of a SHINY app I'm working on that plots chromosome similarity between a selected fungal strain and a collection of other strains that infect different host grass species. Shiny App plot The current plot shows genetic divergence between strain 87-120 plotted against 10 rice (Oryza)-infecting strains (colored in red), 7 St. Augustinegrass (Stenotaphrum)-infecting strains (in dark blue) and 8 finger millet (Eleusine)-infecting strains (light blue). My current problem is that the x-axis values do not represent chromosome positions (instead it's the analysis window number) and I need to melt (or gather) data frame fields in a way that I can use the chromosome position information that is in the headers for the x-axis, and the Group information for the color.

R: Melting an data frame and plotting by group

3 Answers3