0

I am attempting to read in tweets and write these tweets to a file. However, I am getting UnicodeEncodeErrors when I try to write some of these tweets to a file. Is there a way to remove these non utf-8 characters so I can write out the rest of the tweet?

For example, a problem tweet may look it this:

Camera?

This is the code I am using:

with open("Tweets.txt",'w') as f:
    for user_tws in twitter.get_user_timeline(screen_name='camera',
                                          count = 200):
        try:
            f.write(user_tws["text"] + '\n')
        except UnicodeEncodeError:
            print("skipped: " + user_tws["text"])
            mod_tw = user_tws["text"]
            mod_tw=mod_tw.encode('utf-8','replace').decode('utf-8')
            print(mod_tw)
            f.write(mod_tw)

The error is this:

UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f3a5' in position 56: character maps to

Minal Chauhan
  • 6,025
  • 8
  • 21
  • 41
S. M.
  • 50
  • 2
  • 8

1 Answers1

1

You are not writing a UTF8 encoded file, add the encoding parameter to the open function

with open("Tweets.txt",'w', encoding='utf8') as f:
    ...

Have fun

Yoav Glazner
  • 7,936
  • 1
  • 19
  • 36