I developed a script where I processed two large arrays (both of thousands of rows), "parent" and "product"
The starting dataset is something like this:
parent<-sample(1:10000,3500)
product<-sample(1:7500,2500)
mztol<-0.0015
mzdiff<-sample(1:1000,31)
names(mzdiff)<-c("d1", "d2", "d3", "d4", "d5", "d6", "d7", "d8", "d9",
"d10", "d11", "d12", "d13", "d14", "d15", "d16", "d17", "d18",
"d19", "d20", "d21", "d22", "d23", "d24", "d25",
"d26","d27,"d28", "d29", "d30",
"d31")
For first I applied the outer function in order to get a matrix of the differences between the two arrays element by element.
tabdiff<-outer(product,parent,'-')
Then I tried to subtract element by element the matrix tabdiff with a vector (mzdiff) in order to evaluate if there are elements <= a value (mztol). I did it by outer function.
subfun<-function(x,y) abs(x-y)<=mztol
vsubfun<-Vectorize(subfun)
vlogres<-outer(tabdiff,mzdiff,vsubfun)
Here I got a vector whose each element is a logical matrix. Then I converted it in a list:
listres<-alply(vlogres,3,.dims=T)
and put in evidence only the TRUE elements and counted them:
result<-sapply(listres, function(x) table(x)["TRUE"])
Well, the point is the script works fine if I elaborate only small parent and product vectors, like:
parent<-sample(1:1000,150)
product<-sample(1:1500,500)
If I consider large ones I got the error message "error: memory exhausted (limit reached?)" when it processes vlogres. Consider I have a 16 Gb RAM workstation. But it fails anyway.
So how can I optimize this script in order to avoid the error message? Any hint?