I am trying to plot a real-time data getting loaded in dataframe. But the attempts have led to printing of multiple blank graph frames in response to dynamic data feed, instead of plotting the data in single frame of graph.
I am implementing a solution to perform sentiment analysis on live twitter stream. I am able to stream the tweets, put them into a DataFrame and apply the required sentiment analysis algorithm on them one by one. I created a column in the DataFrame which holds the compound value generated by that algorithm for an individual tweet.
This DataFrame is getting dynamically updated as the tweets stream and the intent is to plot this real time updated compound value against time.
I have tried plotting the graph as per mentioned advises of using plt.ion(), plt.draw() instead of plt.show() functions etc. But instead of plotting one frame which gets updated with the values, the program starts printing multiple frames one after another as the data gets updated in the DataFrame.
import pandas as pd
import csv
from bs4 import BeautifulSoup
import re
import tweepy
import ast
from pytz import timezone
from datetime import datetime
import matplotlib.pyplot as plt
import time
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from textblob import TextBlob
from unidecode import unidecode
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
ckey= '#######'
csecret= '#######'
atoken= '#########'
asecret= '#########'
class listener(StreamListener):
def on_data(self,data):
try:
global df
data=json.loads(data)
time = data["created_at"]
tweet = unidecode(data["text"])
tweet1 = BeautifulSoup(tweet,"lxml").get_text()
df = pd.DataFrame(columns = ['time','tweet'])
df['time'] = pd.Series(time)
df['tweet'] = pd.Series(tweet1)
def convert_time(time):
eastern = timezone('US/Eastern')
utc = timezone('UTC')
created_at = datetime.strptime(time, '%a %b %d %H:%M:%S %z %Y')
est_created_at = created_at.astimezone(eastern)
return (est_created_at)
df['time'] = df['time'].apply(convert_time)
def hour(time):
hour = pd.DatetimeIndex(time).hour
return hour
df['hour'] = df['time'].apply(hour)
def sentiment_analysis(tweet):
sid = SentimentIntensityAnalyzer()
return (sid.polarity_scores(tweet)['compound'])
df['compound'] = df['tweet'].apply(sentiment_analysis)
#print(df['compound'])
#print(df['time'])
plt.ion()
fig, ax = plt.subplots()
df.plot(y=df'compound', ax=ax)
ax.clear()
ax.axis([ 0, 24, -5,5])
plt.xlabel('Time')
plt.ylabel('Sentiment')
plt.draw()
plt.pause(0.2)
except KeyError as e:
print(str(e))
return (True)
auth=OAuthHandler(ckey,csecret)
auth.set_access_token(atoken,asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["######"])
Expected Result - One frame of graph getting updated and plotting the real-time data.
Actual Result - Multiple blank graphs
I apologize if i have missed on any information/point.