2

I'm a beginner Python programmer I am finding it hard to figure out a simple Tweepy Streaming api.

Basically I am trying to do the below.

  1. Stream tweets in Portuguese language.

  2. Show the sentiment of each tweets.

I am unable to stream language tweets. Could someone please help me in figuring out what is it that I am doing wrong.

import tweepy
from textblob import TextBlob
### I have the keys updated on those veriables

auth = tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN,ACCESS_TOKEN_SECRET)
API = tweepy.API(auth)


class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print("--------------------")
        print(status.text)
        analysis = TextBlob(status.text)

        if analysis.sentiment.polarity > 0:
            print("sentiment is positiv")
        elif analysis.sentiment.polarity == 0:
            print("sentiment is Neutral")
        else:
            print("sentiment is Negative")
        print("--------------------\n")


myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = API.auth, listener=myStreamListener, tweet_mode='extended', lang='pt')

myStream.filter(track=['trump'])

The example o/p is

RT @SAGEOceanTweets: Innovation Hack Week 2019: @nesta_uk is exploring the possibility of holding a hack week in 2019, focused on state-of-�

However it stops after few tweets and I get this error

      return codecs.charmap_encode(input,self.errors,encoding_table)[0]
      UnicodeEncodeError: 'charmap' codec can't encode 
      character '\U0001f4ca' in position 76: character maps to <undefined>
      [Finished in 85.488s]

And also the tweets are not in Portuguese. How can I stream continuously and also get tweets which are in portuguese and perform a Sentiment analysis

Could you folks please also guide me on how to even stream language tweets and then analyze the sentiment using textblob.

Thank you

Stramzik
  • 297
  • 3
  • 19

1 Answers1

0

This code can help you achieve your goal:

NLP Twitter Streaming Mood

It collects data from Twitter and analyzes mood. However, if you want to develop a sentiment analysis in Portuguese, you should use a trained Wikipedia in Portuguese (Word2Vec), to get the word embeddings of a trained model. That's the only way you can do it reliably. NLTK and Gensim work better in English language, NLTK is very limited in Portuguese.

from nltk import sent_tokenize, word_tokenize, pos_tag
from nltk import sent_tokenize, word_tokenize, pos_tag
import nltk
import numpy as np
from nltk.stem import WordNetLemmatizer
import tweepy
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener
import re

consumer_key = '12345'
consumer_secret = '12345'
access_token = '123-12345'
access_secret = '12345'

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

number_tweets=100
data=[]
for status in tweepy.Cursor(api.search,q="trump").items(number_tweets):
    try:
        URLless_string = re.sub(r'\w+:\/{2}[\d\w-]+(\.[\d\w-]+)*(?:(?:\/[^\s/]*))*', '', status.text)
        data.append(URLless_string)
    except:
        pass

lemmatizer = WordNetLemmatizer()

text=data

sentences = sent_tokenize(str(text))
sentences2=sentences
sentences2

tokens = word_tokenize(str(text))
tokens=[lemmatizer.lemmatize(tokens[i]) for i in range(0,len(tokens))]

len(tokens)

tagged_tokens = pos_tag(tokens)
tagged_tokens
razimbres
  • 4,715
  • 5
  • 23
  • 50
  • Hi Rubens, for the time being lets forget about the sentiment analyzer. If I just have to stream tweets made in Portuguese how can I do that? I have my code updated which is streaming and showing the sentiments but I want it on Portuguese tweets. – Stramzik Apr 22 '19 at 22:30
  • I mean just stream tweets made in portuguese – Stramzik Apr 22 '19 at 22:41