I know this question has been asked a hundred times. But I am stuck for so many hours that I need help. I stream Twitter Events with the help of a lib called "twython". After streaming I save the tweets in a csv. The tweets are utf-8 encoded and saved i a variable which i want to put in an array of tweets. While putting them in the array encoding breaks...
My code looks like this at the moment:
from twython import TwythonStreamer
import unicodecsv as csv
import sys
import datetime
reload(sys)
sys.setdefaultencoding('utf8')
class MyStreamer(TwythonStreamer):
def on_success(self, data):
if 'text' in data:
screenname = "@" + data['user']['screen_name']
tweettext = data['text'].encode('utf-8')
uid = str(data['id'])
timestamp = int(data['timestamp_ms'])
date = datetime.datetime.fromtimestamp(timestamp/1000)
tweet=str(date) + "," + screenname + "," + uid + "," + tweettext
print tweet
tweets.append(tweet)
for tweet in tweets:
tweet.encode('utf-8')
print tweets
if len(tweets)==1:
self.disconnect()
def on_error(self, status_code, data):
print status_code, data
# Requires Authentication as of Twitter API v1.1
stream = MyStreamer(consumer_key, consumer_secret,
access_token, access_token_secret)
stream.statuses.filter(track='#stackoverflow')
resultFile = open("tweets.csv",'a')
writer = csv.writer(resultFile,dialect='excel',encoding='utf-8')
for tweet in tweets:
writer.writerow([tweet])
The variable "tweet" is fine and encoded probably. E.g. "I like üüüüüüü". The array "tweets" will than look like "I like \xfc\xfc\xfc\xfc\xfc\xfc\xfc". Sorry for the mess, I tried almost every method mentioned here. Thanks in advance.
Max