My data looks like below with millions of lines. This text can be copied into a text file and read in for my example below.
@HISEQ:104:C7Y3WACXX:4:1101:1307:1946 1:N:0:CGATGT
NTCCGGTAGTGTAGCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACC
+
#0<FFFBBFBFFFFFIFIFIIIIIIIFIIIIIIIIIIIIIIIIFIIFIII
@HISEQ:104:C7Y3WACXX:4:1101:1356:1968 1:N:0:CGATGT
CGAGAGCTTTGAAGGCCGAAGTGGAAGATCGGAAGAGCACACGTCTGAAC
+
BBBFFFFFFFFFFFFFFFIIIBFFIIIIIFIIIIIIIIIIIIIFFFFFFF
I am trying to read in the text above and determine the length of the strings that start with N,C,G or T. I would usually do something like this:
f <- scan(filepath,nmax=8,what="character",sep="\n")
f1 <- f[grep("^[NAGCT]+",f)]
nchar(f1)
How would I go about doing the same with ff
package?
library(ff)
f <- read.table.ffdf(file=filepath,header=F,nrow=8,sep="\n")
I have tried various approaches but none of them work.