The tweets I capture when streaming with Tweepy come in Unicode special characters and I need them to be letters. I have found many solutions on the site but none of them seemed to work or even to apply to my case, since I’m collecting tweets in real time. Can anyone help?
Here’s my code:
from urllib3.exceptions import ProtocolError
from tweepy import Stream
from tweepy.auth import OAuthHandler
from tweepy.streaming import StreamListener
import time
ckey = 'your code here'
csecret = 'your code here'
atoken = 'your code here'
asecret = 'your code here'
class listener(StreamListener):
def on_data(self, data):
while True:
try:
#print (data)
tweet = data.split(',"text":"')[1].split('","')[0]
tweet2 = data.split(',"screen_name":"')[1].split('","location')[0]
print (tweet2,tweet)
saveFile = open ('test.csv','a')
saveFile.write('@')
saveFile.write(tweet2)
saveFile.write(';')
saveFile.write(tweet)
saveFile.write('\n')
saveFile.close()
return True
except ProtocolError:
continue
except BaseException as e:
print ('Failed on data', str(e))
break
def on_error(self, status):
print (status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=['keyword'])
Here's my output for the keyword "fluminense":
adrianabpadilha Impressionante como mesmo com poucas op\u00e7\u00f5es para o banco o Burro s\u00f3 me sobe o Wisney e o Higor! Pq n\u00e3o levar o Pato\u2026 https:\/\/t.co\/lO4CJJsaaP
Miguel_Aalmeida RT @pulligffc: O Fluminense em dia de jogo olha pra mim e faz isso
TRANQUILINHO3 Time fdpt \ud83d\ude20
LeleoCasttroo @jrmenini @FFvinho Palmeiras e Fluminense ainda tiveram a base como fonte de renda, atl\u00e9tico n\u00e3o revela um jogador\u2026 https:\/\/t.co\/ZF8awS6pDt
SouzaArthur6 @CezarSabia @andreisilvasoar @ndrzej87 @futebol_info C\u00e9zar, existe um tempo certo de testagem, q se d\u00e1 no 5\u00b0 da doe\u2026 https:\/\/t.co\/zmBlBzafdo
Thomasrodrigue_ @renatojr_07 \u00c9 o mesmo exemplo da final da ta\u00e7a rio, a \u00fanica coisa que muda \u00e9 que na final n\u00e3o tinha jogador contam\u2026 https:\/\/t.co\/3Q2nCBw9XS
As you can see, some characters like "ç" and "õ" are shown as "/u00e7" and "\u00f5" respectively.
Thank you!