0

Goal

To run sentiment analysis on a column of text in a pandas dataframe, having it return both score and magnitude values for each line of text.

Current code

This is what I'm running, pulling in a dataframe (df03) with a column of text (text02) that I want to analyze.

# Imports the Google Cloud client library
from google.cloud import language_v1

# Instantiates a client
client = language_v1.LanguageServiceClient()

# The text to analyze
text = df03.loc[:,"text02"]
document = language_v1.Document(
    content=text, type_=language_v1.types.Document.Type.PLAIN_TEXT
)

# Detects the sentiment of the text
sentiment = client.analyze_sentiment(
    request={"document": document}
).document_sentiment

print("Text: {}".format(text))
print("Sentiment: {}, {}".format(sentiment.score, sentiment.magnitude))

And this is the returned error message

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-1c6f7c607084> in <module>()
      8 text = df03.loc[:,"text02"]
      9 document = language_v1.Document(
---> 10     content=text, type_=language_v1.types.Document.Type.PLAIN_TEXT
     11 )
     12 

/usr/local/lib/python3.7/dist-packages/proto/message.py in __init__(self, mapping, ignore_unknown_fields, **kwargs)
    562 
    563         # Create the internal protocol buffer.
--> 564         super().__setattr__("_pb", self._meta.pb(**params))
    565 
    566     def _get_pb_type_from_key(self, key):

TypeError: 01                          Max Muncy is great!
02               The worst Dodger is Max muncy.
03   has type Series, but expected one of: bytes, unicode

Assessment

The error message points to the line:

content=text, type_=language_v1.types.Document.Type.PLAIN_TEXT

The TypeError message attempts to explain what's happening:

has type Series, but expected one of: bytes, unicode

So it seems to recognize the list of text blurbs under the text column in dataframe df03, but apparently I failed to establish the right data type setting.

However, I'm not sure where I'm supposed to set the Type, as the only Document Type settings in the documentation appear to be HTML, PLAIN_TEXT, or TYPE_UNSPECIFIED. Of those, I'm pretty sure PLAIN_TEXT is right.

Documentation: https://googleapis.dev/python/language/latest/language_v1/types.html#google.cloud.language_v1.types.Document

So that leaves me unclear on what that error message is indicating or how I should approach troubleshooting.

Greatly appreciate any input on this.

doug

dsx
  • 167
  • 1
  • 12

1 Answers1

2

It looks like Google's API can't handle a pandas Series directly, but expects you to pass one string at a time. Try applying a custom function to the DataFrame column which contains your text:

def get_sentiment(text):
    # The text to analyze
    document = language_v1.Document(
        content=text,
        type_=language_v1.types.Document.Type.PLAIN_TEXT
    )

    # Detects the sentiment of the text
    sentiment = client.analyze_sentiment(
        request={"document": document}
    ).document_sentiment

    return sentiment


df03["sentiment"] = df03["text02"].apply(get_sentiment)
Peter Leimbigler
  • 10,775
  • 1
  • 23
  • 37
  • thank you! That worked, only it placed the sentiment magnitude and score in the same column, whereas I'd like the numerical values in separate columns with the descriptors as the column headers. I'm looking into that, and am wondering if its something I should be able to do with a series.str.split? – dsx Mar 28 '22 at 21:50
  • 2
    Hi OP, since your main question has been resolved (and it worked based on your comment without any error) and you have your returned values, I suggest you accept @PeterLeimbigler 's answer and then post your additional inquiry as new question since it will tackle separation of column of the returned series values. – Scott B Mar 29 '22 at 06:55
  • 1
    Glad this worked @dsx and thanks @ScottB! And yes, `Series.str.split()` would be the first thing I try - if this doesn't work, that could warrant a new question. – Peter Leimbigler Mar 29 '22 at 13:27
  • Thank you both for your helpful input. Doing as suggested. – dsx Mar 29 '22 at 14:48