I have a vector of 300 sentences, and I am trying to find elementwise JW distance using the stringdist
package. The execution time for the naive implementation is too high, leading me to look for ways to reduce the runtime. I am trying to leverage the doParallel
and foreach
packages, but I'm not getting any significant speedup. This is how I am going about it.
library(foreach)
library(doParallel)
cl = makeCluster(detectCores())
registerDoParallel(cl)
sentence = # vector containing sentences
jw_dist = foreach(i = 1:length(sentence)) %dopar% {
temp = sentence[sentence!=sentence[i]]
return(mean(1 - stringdist::stringdist(sentence[i],temp,method = "jw",nthread = 3))
}
stopCluster(cl)
I would really appreciate if someone can point out ways in which I can speed up this chunk of code.