2

I want to do sentiment analysis of some sentences with Python and TextBlob lib. I know how to use that, but Is there any way to set n-grams to that? Basically, I do not want to analyze word by word, but I want to analyze 2 words, 3 words, because phrases can carry much more meaning and sentiment.

For example, this is what I have done (it works):

from textblob import TextBlob

my_string = "This product is very good, you should try it"

my_string = TextBlob(my_string)

sentiment = my_string.sentiment.polarity
subjectivity = my_string.sentiment.subjectivity

print(sentiment)
print(subjectivity)

But how can I apply, for example n-grams = 2, n-grams = 3 etc? Is it possible to do that with TextBlob, or VaderSentiment lib?

taga
  • 3,537
  • 13
  • 53
  • 119
  • what do you want to set? `mystring.ngrams(n=3)` will give you the 3grams – jeremy_rutman Dec 01 '19 at 12:03
  • Basically, I do not want to analyze sentiment 1 word by 1 word, but I want to analyze sentiment 2 words, 3 words etc – taga Dec 01 '19 at 12:06
  • you could make use of the spacy's noun-chunking feature, that forms more valuable phrases with less noise compared to n-gram method. – Haridas N Dec 03 '19 at 10:48
  • Can you show me how to do that? Or better, to show me how to do that with n-grams and with spacy. – taga Dec 03 '19 at 10:51

2 Answers2

1

Here is a solution that finds n-grams without using any libraries.

from textblob import TextBlob

def find_ngrams(n, input_sequence):
    # Split sentence into tokens.
    tokens = input_sequence.split()
    ngrams = []
    for i in range(len(tokens) - n + 1):
        # Take n consecutive tokens in array.
        ngram = tokens[i:i+n]
        # Concatenate array items into string.
        ngram = ' '.join(ngram)
        ngrams.append(ngram)

    return ngrams

if __name__ == '__main__':
    my_string = "This product is very good, you should try it"

    ngrams = find_ngrams(3, my_string)
    analysis = {}
    for ngram in ngrams:
        blob = TextBlob(ngram)
        print('Ngram: {}'.format(ngram))
        print('Polarity: {}'.format(blob.sentiment.polarity))
        print('Subjectivity: {}'.format(blob.sentiment.subjectivity))

To change the ngram lengths, change the n value in the function find_ngrams().

1

There is no parameter within textblob to define n-grams as opposed to words/unigrams to be used as features for sentiment analysis.

Textblob uses a polarity lexicon to calculate the overall sentiment of a text. This lexicon contains unigrams, which means it can only give you the sentiment of a word but not a n-gram with n>1.

I guess you could work around that by feeding bi- or tri-grams into the sentiment classifier, just like you would feed in a sentence and then create a dictionary of your n-grams with their accumulated sentiment value. But I'm not sure that this is a good idea. I'm assuming you are looking for bigrams to address problems like negation ("not bad") and the lexicon approach won't be able to use not for flipping the sentiment value for bad.

Textblob also contains an option to use a naiveBayes classifier instead of the lexicon approach. This is trained on a movie review corpus provided by nltk but the default features for training are words/unigrams as far as I can make out from peeking at the source code. You might be able to implement your own feature extractor within there to extract n-grams instead of words and then re-train it accordingly and use for your data.

Regardless of all that, I would suggest that you use a combination of unigrams and n>1-grams as features, because dropping unigrams entirely is likely to affect your performance negatively. Bigrams are much more sparsely distributed, so you'll struggle with data sparsity problems when training.

Schnipp
  • 88
  • 1
  • 6