1

I have a file of Tweets which I want/need to perform sentiment analysis on. I have come across this process, which works well however now I want to alter this code, so that I can assign different scores based on sentiment.

This is the code:

    score.sentiment = function(sentences , pos.words, neg.words , progress='none')
{
 require(plyr)
 require(stringr)
 scores = laply(sentences,function(sentence,pos.words,neg.words)
 {
     sentence =gsub('[[:punct:]]','',sentence)
     sentence =gsub('[[:cntrl]]','',sentence)
     sentence =gsub('\\d+','',sentence)
     sentence=tolower(sentence)
     word.list=str_split(sentence,'\\s+')
     words=unlist(word.list)
     pos.matches=match(words,pos.words)
     neg.matches=match(words,neg.words)
     pos.matches = !is.na(pos.matches)   
     neg.matches = !is.na(neg.matches) 
     score=sum(pos.matches)-sum(neg.matches)
     return(score)
 },pos.words,neg.words,.progress=.progress)
 scores.df=data.frame(scores=scores,text=sentences)
 return(scores.df)
}  

What I am now looking to do, is to have FOUR dictionaries;

super.words, pos,words, neg.words, terrible.words.

I want to assign different scores for each of these dictionaries : super.words =+2, pos.words=+1, neg.words=-1, terrible.words=-2.

I know that pos.matches = !is.na(pos.matches) and neg.matches = !is.na(neg.matches) assigns 1/0 for TRUE/FALSE, however I want to find out how to assign these specific scores which gives a score for EACH tweet.

At the moment, I am just focusing on the standard two dictionaries, pos and neg. I have assigned scores to these two data frames:

posDF<-data.frame(words=pos, value=1, stringsAsFactors=F)

negDF<-data.frame(words=neg, value=-1, stringsAsFactors=F)

and tried to run the above algorithm with these however nothing works.

I came across this page and this page where one has written several 'for' loops however the end result only provides an overall score of either -1,0 or 1.

Ultimately, I am looking for a result similar to this:

table(analysis$score)

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 19

3 8 49 164 603 2790 ..................etc

however so far , if I get a result that doesn't involve having to "debug" the code, I get this:

< table of extent 0 >

Here are some sample Tweets I am using:

tweets<-data.frame(words=c("@UKLabour @KarlTurnerMP #LabourManifesto Speaking as a carer, labours NHS plans are all good news, very happy. Making my day this!", "#LabourManifesto eggs and sweet things are looking evil", "@UKLabour @KarlTurnerMP Half way through the #LabourManifesto, this will definitely improve every-bodies lives if implemented fully.", "There is nothing "long term" about fossil fuels. #fracking #labourmanifesto https://twitter.com/stevetopple/status/587576796599595012", "Fair play Ed, very strong speech! Finally had the chance to watch it. #LabourManifesto wanna see the other manifestos nowwww") )

Any help is greatly appreciated!



So, essentially, I am wondering if there is a way to change this section of the original script:

pos.matches=match(words,pos.words)
 neg.matches=match(words,neg.words)
 pos.matches = !is.na(pos.matches)   
 neg.matches = !is.na(neg.matches)

so I can assign my own specific scores? (pos.words=+1, neg.words=-1) ? Or if I would have to incorporate various if and for loops?

Community
  • 1
  • 1
L. Natalka
  • 25
  • 6
  • If you are just looking to use custom scores in generating the total score, you could just change this line `score=sum(pos.matches)-sum(neg.matches)` to be something like `score=sum(super.pos.matches)*2 + sum(pos.matches) + sum(neg.matches)*(-1) + sum(terrible.matches)*(-2)` – scribbles Aug 07 '15 at 15:13
  • Thank you @scribbles - I didn't even think to alter that line! – L. Natalka Aug 08 '15 at 10:40
  • did that solve your problem? If so I'll change my comment to an answer that you can accept so that others know this is no longer an open question – scribbles Aug 08 '15 at 14:19
  • I think the second half of your question should be posted as a separate question - it seem @scribbles has answered the first part. – Darren Cook Aug 08 '15 at 17:47
  • The way you mentioned worked for changing the score for just the pos and neg words, however now I would like to apply this to 4 dictionaries and this doesn't work. The algorithm only works with pos and neg. When trying to add more in the first line, the error message appears: Error: unexpected symbol in "score.sentiment = function(sentences, pos1.words, pos2.words, neg1.words, neg2.words .progress" and following from that, the function seems to only take positive and negative. Do you think I would have to write two of these functions and then somehow combine the results at the end? – L. Natalka Aug 11 '15 at 09:52

2 Answers2

0

If you are just looking to use custom scores in generating the total score, you could just change this line score=sum(pos.matches)-sum(neg.matches) to be something like:

score=sum((super.pos.matches)*2 + sum(pos.matches) + sum(neg.matches)*(-1) + sum(terrible.matches)*(-2))
scribbles
  • 4,089
  • 7
  • 22
  • 29
0

IF you are considering four dictionaries.(at your function line you are missing a "." at front of progress).

Below code is helpful for you

        score.sentiment = function(sentences , pos.words, neg.words , .progress='none')
{
 require(plyr)
 require(stringr)
 scores = laply(sentences,function(sentence,pos.words,neg.words)
 {
     sentence =gsub('[[:punct:]]','',sentence)
     sentence =gsub('[[:cntrl]]','',sentence)
     sentence =gsub('\\d+','',sentence)
     sentence=tolower(sentence)
     word.list=str_split(sentence,'\\s+')
     words=unlist(word.list)
     pos.matches=match(words,pos.words)
     super.pos.matches=match(words,super.pos.words)
     neg.matches=match(words,neg.words)
     terrible.matches=match(words,terrible.words)
     pos.matches = !is.na(pos.matches)
     super.pos.matches = !is.na(super.pos.matches)   
     neg.matches = !is.na(neg.matches)
     terrible.matches = !is.na(terrible.matches) 
     score=sum((super.pos.matches)*2 + sum(pos.matches) - sum(neg.matches) 
           - sum(terrible.matches)*(2))
     return(score)
 },pos.words,neg.words,.progress=.progress)
 scores.df=data.frame(scores=scores,text=sentences)
 return(scores.df)
}