0

I know this question has been asked a hundred times. But I am stuck for so many hours that I need help. I stream Twitter Events with the help of a lib called "twython". After streaming I save the tweets in a csv. The tweets are utf-8 encoded and saved i a variable which i want to put in an array of tweets. While putting them in the array encoding breaks...

My code looks like this at the moment:

from twython import TwythonStreamer
import unicodecsv as csv
import sys
import datetime
reload(sys)  
sys.setdefaultencoding('utf8') 
class MyStreamer(TwythonStreamer):
    def on_success(self, data):
        if 'text' in data:
            screenname = "@" + data['user']['screen_name']
            tweettext = data['text'].encode('utf-8')
            uid = str(data['id'])
            timestamp = int(data['timestamp_ms'])
            date = datetime.datetime.fromtimestamp(timestamp/1000)
            tweet=str(date) + "," + screenname + "," + uid + "," + tweettext
            print tweet
            tweets.append(tweet)
            for tweet in tweets:
                tweet.encode('utf-8')
            print tweets
            if len(tweets)==1:
                self.disconnect()

     def on_error(self, status_code, data):
        print status_code, data

# Requires Authentication as of Twitter API v1.1
stream = MyStreamer(consumer_key, consumer_secret,
                access_token, access_token_secret)


stream.statuses.filter(track='#stackoverflow')

resultFile = open("tweets.csv",'a')
writer = csv.writer(resultFile,dialect='excel',encoding='utf-8')
for tweet in tweets:
    writer.writerow([tweet])

The variable "tweet" is fine and encoded probably. E.g. "I like üüüüüüü". The array "tweets" will than look like "I like \xfc\xfc\xfc\xfc\xfc\xfc\xfc". Sorry for the mess, I tried almost every method mentioned here. Thanks in advance.

Max

hypePG
  • 67
  • 1
  • 8
  • `ü` encoded as UTF-8 should be `C3 BC`, not `FC`. And is your problem that the CSV writer is writing `\xfc`? Because that should be parsed properly when loading it again – Artyer Jun 20 '17 at 21:26
  • Shit. Than I am probably looking for the wrong problem. Yeah I get string like \xfc or "BÌ_rgern" which should be "Bürgern" and characters like "\u2026" which is the utf encoding for a smiley . Any idea? – hypePG Jun 20 '17 at 21:33

0 Answers0