I am trying to create a function that will return counts of specific adjacent nucleotides (CG beside eachother) within a specific window that I have formatted in a vector.
I would like the windows to be 100 nucleotides long and move shift every 10.
The data is setup like this (to 10k entries):
data <- c("a", "g", "t", "t", "g", "t", "t", "a", "g", "t", "c", "t",
"a", "c", "g", "t", "g", "g", "a", "c", "c", "g", "a", "c")
So far I have tried this:
library(zoo)
library(seqinr)
rollapply(data, width=100, by=10, FUN=count(data, wordsize=2))
But I always get the error
"Error in match.fun(FUN) :
'count(data, 2)' is not a function, character or symbol"
I have also tried:
starts <- seq(1, length(data)-100, by = 100)
n <- length(starts)
for (i in 1:n){
chunk <- data[starts[i]:(starts[i]+99)]
chunkCG <- count(chunk,wordsize=2)
print (chunkCG)
}
However, I do not know how to save the data that is returned. This approach also does not allow me to overlap frames.