I mapped my reads to my assembly using the bwa mem
algorithm and extracted the number of reads per base (= coverage) using samtools depth
. The resulting file is the following:
1091900001 1 236
1091900001 2 245
1091900001 3 265
1091900001 4 283
1091900001 5 288
1091900002 1 297
1091900002 2 312
1091900002 3 327
1091900002 4 338
1091900002 5 348
with three columns: name of the contig (since it is a multi-contig file, this ID changes) - position (base) - number of reads that mapped (coverage).
Now I want to calculate the coverage (third column) in sliding windows; in a window size of 3 and slide of 2 as the mean - per contig (first column).
I want to use the rollapply
function of the zoo
package.
require(zoo)
cov <- read.table("file",header=FALSE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)
library(reshape) #loads the library to rename the column names
cov<-rename(cov,c(V1="Chr", V2="locus", V3="depth")) #renames the header
rollapply(cov$depth, width = 3, by = 2, FUN = mean, align = "left")
But this of course does not take into account the contig. Plus, my expected output should include the contig-info & the window, it was calculated:
1091900001 1 3 248.6667
1091900001 3 5 278.6667
1091900002 1 3 312.0000
1091900002 3 5 337.6667
Is there an easy way for doing that in R
?