I would like to plot average coverage depth across my genome, with chromosomes lined in increasing order. I have calculated coverage depth per position for my genome using samtools. I would like to generate a plot (which uses 1kb windows) like Figure 7: http://www.g3journal.org/content/ggg/6/8/2421/F7.large.jpg?width=800&height=600&carousel=1
Example dataframe:
Chr locus depth
chr1 1 20
chr1 2 24
chr1 3 26
chr2 1 53
chr2 2 71
chr2 3 74
chr3 1 29
chr3 2 36
chr3 3 39
Do I need to change the format of the dataframe to allow continuous numbering for the V2 variable? Is there a way to average every 1000 lines, and to plot the 1kb windows? And how would I go about plotting?
UPDATE EDIT: I was able to create a new dataset as a rolling average of non overlapping 1kb windows using this post: Genome coverage as sliding window and I did make V2 continuous ie (1:9 instead of 1,2,3,1,2,3,1,2,3)
library(reshape) # to rename columns
library(data.table) # to make sliding window dataframe
library(zoo) # to apply rolling function for sliding window
#genome coverage as sliding window
Xdepth.average<-setDT(Xdepth)[, .(
window.start = rollapply(locus, width=1000, by=1000, FUN=min, align="left", partial=TRUE),
window.end = rollapply(locus, width=1000, by=1000, FUN=max, align="left", partial=TRUE),
coverage = rollapply(coverage, width=1000, by=1000, FUN=mean, align="left", partial=TRUE)
), .(Chr)]
And to plot
library(ggplot2)
Xdepth.average.plot <- ggplot(Xdepth.average, aes(x=window.end, y=coverage, colour=Chr)) +
geom_point(shape = 20, size = 1) +
scale_x_continuous(name="Genomic Position (bp)", limits=c(0, 12071326), labels = scales::scientific) +
scale_y_continuous(name="Average Coverage Depth", limits=c(0, 200))
I didn't have any luck using facet_grid
so I added reference lines using geom_vline(xintercept = c()
. See the answer I posted below for extra details/codes as well as links to plots. Now I just need to work on the labeling...