I have data in the following format. Column V1 is the genomic location of interest, and column V4 and V5 are the minor allele frequencies at two different points in time. I would like to make a simple xy scatter plot with a line connecting the allele frequency for each specific location from timepoint 1 to timepoint 2(plotted on y-axis). (Note, I actually have hundreds to thousands of data points).
V1 V2 V3 V4 V5
1 153 1/113 1/115 0.008849558 0.008695652
2 390 0/176 150/152 0.000000000 0.986842105
3 445 1/149 1/152 0.006711409 0.006578947
4 507 0/154 144/146 0.000000000 0.986301370
5 619 1/103 99/101 0.009708738 0.980198020
6 649 0/138 120/123 0.000000000 0.975609756
I feel like I should be able to accomplish this with ggplot, but I am not sure how to go about doing so, as I don't know how to specify two y-values for each genomic position, nor specify a column as a category. I suspect the data needs to be reshaped somehow. Any help or suggestions are greatly appreciated!
Update:
Thanks to all who gave me suggestions. I don't think I was very clear about wanting the time points to be my x-axis as opposed to the genomic position - my apologies. Hopefully this picture clarifies that!
I have successfully generated the plot I wished to make with the following code:
ggplot(dat) + geom_segment(aes(x="timepoint 1", y=V4, xend="timepoint2", yend=V5))
and this is what the plot looks like with more data points...
I haven't changed the axes titles and played with margins yet, but this is the general idea!