r tm determine polarity of documents parallel using foreach

Question

I am new to the r tm coding world and I am trying to process a large textual data frame parallel while using a foreach %dopar% loop (as I found out that this is way quicker). However, I do not really understand how that works respectively how I can convert my initial for loop into a parallel foreach loop.

Particularly, I want to determine the polarities of my data set, whereas I need the score for many different polarity.frames(WORDKEY). The results shall be passed to a summarizing data frame (frequency_w). My for loop code so far looks as follows (works fine for smaller samples):

for (i in 1:length(POLKEY$x)){
WORDKEY=sentiment_frame(as.character(POLKEY$x[i]),NULL,as.integer(POLKEY$y[i]))
Poldat2=with(data, polarity(text, list(docs), polarity.frame = WORDKEY, negators=Negator,amplifiers=Ampl,deamplifiers=DeAmpl, amplifier.weight = 1))
frequency_w$docs=as.factor(Poldat2[["group"]][,"docs"])
frequency_w[(i+1)]=as.numeric(Poldat2[["group"]][,"ave.polarity"]
}

The main problem is that the code so far takes forever to run through my entire data base (80.000 docs), hence if you have any other recommendations in terms of the coding, to reduce memory usage or increase the speed I'd be happy.

Moreover, as I need to run sentSplit to use the polarity-function I'd also be glad if anybody has an idea how to increase the speed of that process as well for my entire data set or even include it into the loop.

Thanks you so much for your help in advance!

I am the author of qdap. I'd recommend you use the breakout package sentimentr https://github.com/trinker/sentimentr It is an improved algorithm and improved speed. — Tyler Rinker, Oct 28 '15 at 12:11
Thank you very much for your help and fast reply! Didn't expect the author of the package to answer - great. I will try it out as soon as possible. As I want to run the sentiment for each word, do I need to use the foreach loop or is there also an easier way with your package? Btw, why did you decide to devide by the square root of N and not just N? Thanks again! — C. G., Oct 28 '15 at 13:38
Sentiment of each word is not possible. I'd break the strings up to words and use a simple lookup with the sentiment dictionary. Divided by square root of N to give it less impact. — Tyler Rinker, Oct 28 '15 at 14:54

r tm determine polarity of documents parallel using foreach

0 Answers0