0

I have data in the following format. Column V1 is the genomic location of interest, and column V4 and V5 are the minor allele frequencies at two different points in time. I would like to make a simple xy scatter plot with a line connecting the allele frequency for each specific location from timepoint 1 to timepoint 2(plotted on y-axis). (Note, I actually have hundreds to thousands of data points).

   V1    V2      V3          V4          V5
1 153 1/113   1/115 0.008849558 0.008695652
2 390 0/176 150/152 0.000000000 0.986842105
3 445 1/149   1/152 0.006711409 0.006578947
4 507 0/154 144/146 0.000000000 0.986301370
5 619 1/103  99/101 0.009708738 0.980198020
6 649 0/138 120/123 0.000000000 0.975609756

I feel like I should be able to accomplish this with ggplot, but I am not sure how to go about doing so, as I don't know how to specify two y-values for each genomic position, nor specify a column as a category. I suspect the data needs to be reshaped somehow. Any help or suggestions are greatly appreciated!


Update:

Thanks to all who gave me suggestions. I don't think I was very clear about wanting the time points to be my x-axis as opposed to the genomic position - my apologies. Hopefully this picture clarifies that!

I have successfully generated the plot I wished to make with the following code:

ggplot(dat) + geom_segment(aes(x="timepoint 1", y=V4, xend="timepoint2", yend=V5))

and this is what the plot looks like with more data points...

allelefreqtrajectories

I haven't changed the axes titles and played with margins yet, but this is the general idea!

ONeillMB1
  • 343
  • 6
  • 19
  • As you are rather new on SO, please take some time to read [about Stackoverflow](http://stackoverflow.com/about) and [what to ask](http://stackoverflow.com/help/on-topic). As you will find in these two links, you should "show your work", and "Questions asking for code must include attempted solutions, why they didn't work". Thanks for providing a small, dummy data set! – Henrik Nov 08 '13 at 22:35
  • These don't really seem to be what most people would call "time series". – IRTFM Nov 08 '13 at 23:06

3 Answers3

1

If your example data was in DF, then

ggplot(DF) +
  geom_segment(aes(x=V4, y="timepoint 1", xend=V5, yend="timepoint 2"))

enter image description here

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • Thank you! I am going to play with this tomorrow and will report back if it does what I am after tomorrow. I've never used geom_segment(), so I'm excited to try this out! – ONeillMB1 Nov 11 '13 at 05:24
  • This works great! I flipped the axes around, and have plotted it with all the data points. I have one data set with three timepoints so next I will be playing with geom_path or geom_line it looks like. Thanks for cluing me in on these functions/layers of ggplot. – ONeillMB1 Nov 12 '13 at 16:19
0
with(dat, plot(x=V1, y=V5, ylim=c(0,1) ,type='n',
      xaxt="n", ylab="Allele Frequency", xlab="Genomic Location"))
with(dat, axis(1, V1,V1, cex.axis=0.7)   )
with( dat, arrows(x0=V1,x1=V1+10, y0=V4, y1=V5) )

You can clean up the labeling and tweak colors and arrowhead features:

?arrows

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you for the suggestion. I'm actually looking to have the time points as the x-axis (the genomic location is just how I am storing the allele frequencies at different time points). I won't have a chance to play with the code until tomorrow or Tuesday, but I will report back once I do. – ONeillMB1 Nov 11 '13 at 05:27
  • You cannot get worked examples if you don't include values for time in your data. – IRTFM Nov 11 '13 at 07:36
0

It's not completely clear from the question, but I think this is what you're after:

ggplot(d, aes(x=V1, y=V4, ymin=V4, ymax=V5)) 
  + geom_linerange() 
  + xlab('Genomic location') 
  + ylab('Minor allele frequency')

Docs: http://docs.ggplot2.org/current/geom_linerange.html

enter image description here

Rik Smith-Unna
  • 3,465
  • 2
  • 21
  • 21
  • Thank you for the suggestion! I am going to look into the geom_linerange() as this may get at what I am after. I actually want the time points to be the x-axis (the genomic location simply serves as a variable to store the allele freq at the two different time points.) – ONeillMB1 Nov 11 '13 at 05:21