I am new to the r tm coding world and I am trying to process a large textual data frame parallel while using a foreach %dopar% loop (as I found out that this is way quicker). However, I do not really understand how that works respectively how I can convert my initial for loop into a parallel foreach loop.
Particularly, I want to determine the polarities of my data set, whereas I need the score for many different polarity.frames(WORDKEY). The results shall be passed to a summarizing data frame (frequency_w). My for loop code so far looks as follows (works fine for smaller samples):
for (i in 1:length(POLKEY$x)){
WORDKEY=sentiment_frame(as.character(POLKEY$x[i]),NULL,as.integer(POLKEY$y[i]))
Poldat2=with(data, polarity(text, list(docs), polarity.frame = WORDKEY, negators=Negator,amplifiers=Ampl,deamplifiers=DeAmpl, amplifier.weight = 1))
frequency_w$docs=as.factor(Poldat2[["group"]][,"docs"])
frequency_w[(i+1)]=as.numeric(Poldat2[["group"]][,"ave.polarity"]
}
The main problem is that the code so far takes forever to run through my entire data base (80.000 docs), hence if you have any other recommendations in terms of the coding, to reduce memory usage or increase the speed I'd be happy.
Moreover, as I need to run sentSplit to use the polarity-function I'd also be glad if anybody has an idea how to increase the speed of that process as well for my entire data set or even include it into the loop.
Thanks you so much for your help in advance!