1

I have a dataframe that contains rows of tweets and i would like to create 4 columns of the scores 'positive', 'negative', 'neutral' and 'compound' based on the content of each row using vader sentiment analysis.

I looked up different posts but i couldnt figure it out for my exact case. Thank you in advance!

Specter07
  • 201
  • 4
  • 12

3 Answers3

11

I actually found a simple solution to do it through list comprehensions for anyone facing the same problem:

analyzer = SentimentIntensityAnalyzer()
df['compound'] = [analyzer.polarity_scores(x)['compound'] for x in df['tweet']]
df['neg'] = [analyzer.polarity_scores(x)['neg'] for x in df['tweet']]
df['neu'] = [analyzer.polarity_scores(x)['neu'] for x in df['tweet']]
df['pos'] = [analyzer.polarity_scores(x)['pos'] for x in df['tweet']]
Specter07
  • 201
  • 4
  • 12
2

Something like this should work:

analyzer = SentimentIntensityAnalyzer()
df['rating'] = df['tweets'].apply(analyzer.polarity_scores)
pd.concat([df.drop(['rating'], axis=1), df['rating'].apply(pd.Series)], axis=1)
luigigi
  • 4,146
  • 1
  • 13
  • 30
2

I have done same type of work using Vader for sentiment analysis in python 3. Take a look you may find a way of how it possible to perform what you need.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import time
analyzer = SentimentIntensityAnalyzer()

pos_count = 0
pos_correct = 0

with open("D:/Corona_Vac/pythonprogramnet/Positive BOW.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if not vs['neg'] > 0.1:
            if vs['pos']-vs['neg'] > 0:
                pos_correct += 1
            pos_count +=1


neg_count = 0
neg_correct = 0

with open("D:/Corona_Vac/pythonprogramnet/Positive BOW.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if not vs['pos'] > 0.1:
            if vs['pos']-vs['neg'] <= 0:
                neg_correct += 1
            neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))

Hope you may fix. Thanks

Hashan Malawana
  • 333
  • 1
  • 2
  • 10