I have a collection of about 1.4 million tweets in a MongoDB collection. I want to find all that are NOT retweets, and am using Python. The structure of a document is as follows:
{
'_id': ObjectId('59388c046b0c1901172555b9'),
'coordinates': None,
'created_at': datetime.datetime(2016, 8, 18, 17, 17, 12),
'geo': None,
'is_quote': False,
'lang': 'en',
'text': b'Adam Cole Praises Kevin Owens + A Preview For Next Week\xe2\x80\x99s',
'tw_id': 766323071976247296,
'user_id': 2231233110,
'user_lang': 'en',
'user_loc': 'main; @Kan1shk3',
'user_name': 'sheezy0',
'user_timezone': 'Chennai'
}
I can write a query that works to find the particular tweet from above:
twitter_mongo_collection.find_one({
'text': b'Adam Cole Praises Kevin Owens + A Preview For Next Week\xe2\x80\x99s'
})
But when I try to find retweets, my code doesn't work, for example I try to find any tweets that start like this:
'text': b'RT some tweet'
Using this query:
find_one( {'text': {'$regex': "/^RT/" } } )
It doesn't return an error, but it doesn't find anything. I suspect it has something to do with that 'b' at the beginning before the text starts. I know I also need to put '$not:' in there somewhere but am not sure where.
Thanks!