1

I have a large dataset that I'd like to use to plot genetic divergence along chromosomes. The data frame I am using has the following format.

ID      Group   100     270     310     430     460     550     580     660     710     740
Strain1 A       0.191   0.147   0.124   0.149   0.193   0.189   0.123   0.189   0.151   0.180
Strain2 A       0.188   0.188   0.149   0.136   0.000   0.199   0.199   0.188   0.149   0.000
Strain3 B       0.123   0.147   0.190   0.061   0.148   0.149   0.148   0.197   0.178   0.172
Strain4 B       0.147   0.197   0.188   0.178   0.179   0.149   0.191   0.154   0.179   0.187

I'd like to use ggplot2 to plot a line for each strain, with the lines colored according to group affiliation, and a continuous x-axis running from chromosome positions 100 through 740. I cannot figure out how to melt the data without extracting the group info first and then adding it back after melting. Can anyone suggest a one-step approach to accomplish this?

DrDNA
  • 11
  • 2
  • 1
    Not clear about thje expected. May be `library(tidyr);gather(df1, key, val, 3:ncol(df1))` – akrun Mar 20 '19 at 04:53

3 Answers3

1

We could gather into 'long' format and then plot with ggplot

library(ggplot2)
library(dplyr)
library(tidyr)
gather(df1, key, val, 3:ncol(df1)) %>% 
   mutate(key = as.numeric(key)) %>%
   ggplot() + 
     geom_line(aes(x = key, y = val, group = Group, color = Group))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    I think maybe the grouping variable needs to be `ID` so you can get a line per strain and the colour needs to be `Group`. i.e. `geom_line(aes(x = key, y = val, group = "ID", color = Group))`. That should give three lines and 2 colours. – Croote Mar 20 '19 at 05:13
1

I think this will work best if you colour by Group and facet on Strain. Assuming dataframe is named mydata:

library(tidyr)
library(ggplot2)

mydata %>% 
  gather(Var, Val, -Group, -ID) %>% 
  ggplot(aes(Var, Val)) + 
  geom_line(aes(color = Group, group = Group)) + 
  facet_grid(ID ~ .)

enter image description here

neilfws
  • 32,751
  • 5
  • 50
  • 63
0

The answer by akrun is almost there, except there should be one line plotted for each strain. For more information, here's a link to a screen shot (sorry, need more rep for posting actual image) of a SHINY app I'm working on that plots chromosome similarity between a selected fungal strain and a collection of other strains that infect different host grass species. Shiny App plot The current plot shows genetic divergence between strain 87-120 plotted against 10 rice (Oryza)-infecting strains (colored in red), 7 St. Augustinegrass (Stenotaphrum)-infecting strains (in dark blue) and 8 finger millet (Eleusine)-infecting strains (light blue). My current problem is that the x-axis values do not represent chromosome positions (instead it's the analysis window number) and I need to melt (or gather) data frame fields in a way that I can use the chromosome position information that is in the headers for the x-axis, and the Group information for the color.

DrDNA
  • 11
  • 2