1

I'm using the NLTK package and it has a function that tells me whether a given sentence is positive, negative, or neutral:

from nltk.sentiment.util import demo_liu_hu_lexicon

demo_liu_hu_lexicon('Today is a an awesome, happy day')
>>> Positive

Problem is, that function doesn't have a return statement - it just prints "Positive", "Negative", or "Neutral" to stdout. All it returns - implicitly - is a NoneType object. (Here's the function's source code.)

Is there any way I can capture this output (other than messing with the NLTK source code on my machine)?

Parzival
  • 2,004
  • 4
  • 33
  • 47

3 Answers3

3
import sys
from io import StringIO

class capt_stdout:
    def __init__(self):
        self._stdout = None
        self._string_io = None

    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._string_io = StringIO()
        return self

    def __exit__(self, type, value, traceback):
        sys.stdout = self._stdout

    @property
    def string(self):
        return self._string_io.getvalue()

use like this:

with capt_stdout() as out:
    demo_liu_hu_lexicon('Today is a an awesome, happy day')
    demo_liu_hu_lexicon_output = out.string
alexisdevarennes
  • 5,437
  • 4
  • 24
  • 38
1

TL;DR

The demo_liu_hu_lexicon function is a demo function of how you could use the opinion_lexicon. It's used for testing and should not be used directly.


In Long

Let's look at the function and see how we can re-create a similar function https://github.com/nltk/nltk/blob/develop/nltk/sentiment/util.py#L616

def demo_liu_hu_lexicon(sentence, plot=False):
    """
    Basic example of sentiment classification using Liu and Hu opinion lexicon.
    This function simply counts the number of positive, negative and neutral words
    in the sentence and classifies it depending on which polarity is more represented.
    Words that do not appear in the lexicon are considered as neutral.
    :param sentence: a sentence whose polarity has to be classified.
    :param plot: if True, plot a visual representation of the sentence polarity.
    """
    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank

    tokenizer = treebank.TreebankWordTokenizer()

Okay, that's a strange use for imports to exist inside the function but this is because it's a demo function use for simple testing or documentation.

Also, the usage of treebank.TreebankWordTokenizer() is rather odd, we can simply use the nltk.word_tokenize.

Let's move the imports out and rewrite the demo_liu_hu_lexicon as a simple_sentiment function.

from nltk.corpus import opinion_lexicon
from nltk import word_tokenize

def simple_sentiment(text):
    pass

Next, we see

def demo_liu_hu_lexicon(sentence, plot=False):
    """
    Basic example of sentiment classification using Liu and Hu opinion lexicon.
    This function simply counts the number of positive, negative and neutral words
    in the sentence and classifies it depending on which polarity is more represented.
    Words that do not appear in the lexicon are considered as neutral.
    :param sentence: a sentence whose polarity has to be classified.
    :param plot: if True, plot a visual representation of the sentence polarity.
    """
    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank

    tokenizer = treebank.TreebankWordTokenizer()
    pos_words = 0
    neg_words = 0
    tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]

    x = list(range(len(tokenized_sent))) # x axis for the plot
    y = []

The function

  1. first tokenized and lower-cased the sentence
  2. initialize the number of positive and negative words.
  3. x and y is initialized for some plotting later, so let's ignore that.

If we go further down the function:

def demo_liu_hu_lexicon(sentence, plot=False):
    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank

    tokenizer = treebank.TreebankWordTokenizer()
    pos_words = 0
    neg_words = 0
    tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]

    x = list(range(len(tokenized_sent))) # x axis for the plot
    y = []

    for word in tokenized_sent:
        if word in opinion_lexicon.positive():
            pos_words += 1
            y.append(1) # positive
        elif word in opinion_lexicon.negative():
            neg_words += 1
            y.append(-1) # negative
        else:
            y.append(0) # neutral

    if pos_words > neg_words:
        print('Positive')
    elif pos_words < neg_words:
        print('Negative')
    elif pos_words == neg_words:
        print('Neutral')
  1. The loop simply go through each token and check wether the word is in the positive / negative lexicon.

  2. At the end, it checks the no. of positive and negative words and return the tag.

Now lets see whether we can have a better simple_sentiment function, now that we know what demo_liu_hu_lexicon do.

Tokenization in step 1 can't be avoided, so we have:

from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank

def simple_sentiment(text):
    tokens = [word.lower() for word in word_tokenize(text)]

There's an lazy way out to do step 2-5 is to just copy+paste and change the print() -> return

from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank

def simple_sentiment(text):
    tokens = [word.lower() for word in word_tokenize(text)]

    for word in tokenized_sent:
        if word in opinion_lexicon.positive():
            pos_words += 1
            y.append(1) # positive
        elif word in opinion_lexicon.negative():
            neg_words += 1
            y.append(-1) # negative
        else:
            y.append(0) # neutral

    if pos_words > neg_words:
        return 'Positive'
    elif pos_words < neg_words:
        return 'Negative'
    elif pos_words == neg_words:
        return 'Neutral'

Now, you have a function that you can do whatever you please.


BTW, the demo is really odd..

When we see a positive word add 1 and when we see a negative we add -1. And we say something is positive when pos_words > neg_words.

That means that the list of integers comparison follows some Pythonic sequence comparison that might have no linguistic or mathematical logic =(See What happens when we compare list of integers?)

alvas
  • 115,346
  • 109
  • 446
  • 738
0
import sys
import io
from io import StringIO

stdout_ = sys.stdout
stream = StringIO()
sys.stdout = stream
demo_liu_hu_lexicon('PLACE YOUR TEXT HERE') 
sys.stdout = stdout_ 
sentiment = stream.getvalue()     
sentiment = sentiment[:-1]
David Veitch
  • 125
  • 5
  • 2
    Please, consider giving some explanation on the code (why, how and insights) to teach the person who posted the question to understand it in depth. – Cabrra Nov 21 '18 at 23:04