2

I'm downloading tweets with specific words included using the Streaming twitter API and analysing with python using pymongo; I get the results back in json format.

I'm looking to search with specific emojis as well as words. When I print the output straight from the API, the emojis are in unicode form; but when I try to retrieve tweets with emojis in them from the mongoDB, it's in emoji form, which I can't use for analysis; whenever I try to view the text of tweets in different programs it strips the emojis. I'd prefer it to store in unicode form, is there any way I can do this?

Edit: To be clear, I get back ' ⛽️' and the like instead of the unicode from the stream print, eg '\U0001f600 \U0001f608'

  • 2
    What do you mean by "emoji form"? What is the difference between "emoji form" and "unicode form"? – Hayden Schiff Aug 10 '15 at 20:19
  • As in, instead of '\u00***' or anything similar, I literally get: ⛽️ which I can't use. – Kate Bradley Aug 10 '15 at 20:21
  • 2
    Ohhh, okay, so you want Unicode escape sequences instead of actual Unicode characters. Instead of pushing them to the database as `\uXXXX`, try pushing them as `\\uXXXX`. – Hayden Schiff Aug 10 '15 at 20:24

0 Answers0