There is no straightforward way to add bigram to the vader lexicon. This is because vader considers individual tokens for sentiment analysis. However, one can do this using following steps:
- Create bigrams as tokens. For example, you can convert the bigram ("no issues") into a token ("noissues").
- Maintain a dictionary of polarity of the newly
created tokens. {"noissues" : 2}
- Then perform additional text processing before
passing the text for sentiment score calculation.
Following code accomplishes the above:
allowed_bigrams = {'noissues' : 2} #add more as per your requirement
def process_text(text):
tokens = text.lower().split() # list of tokens
bigrams = list(nltk.bigrams(tokens)) # create bigrams as tuples of tokens
bigrams = list(map(''.join, bigrams)) # join each word without space to create new bigram
bigrams.append('...') # make length of tokens and bigrams list equal
#begin recreating the text
final = ''
for i, token in enumerate(tokens):
b = bigrams[i]
if b in allowed_bigrams:
join_word = b # replace the word in text by bigram
tokens[i+1] = '' #skip the next word
else:
join_word = token
final += join_word + ' '
return final
text = 'Hello, I have no issues with you'
print (text)
print (analyser.polarity_scores(text))
final = process_text(text)
print (final)
print(analyser.polarity_scores(final))
The output :
Hello, I have no issues with you
{'neg': 0.268, 'neu': 0.732, 'pos': 0.0, 'compound': -0.296}
hello, i have noissues with you
{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'compound': 0.4588}
Notice in the output, how two words "no" and "issues" have been added together to form bigram "noissues".