0

Background: I want to get only unique tweets. According to comments on stackoverflow, one way to do this is to create a set

However, when I try the following code, I get an TypeError: Unhashable. I found some info here TypeError : Unhashable type. I also know I can remove duplicates in MongoDB, where I am storing, but it's cleaner if I do it before storing.

Question: Is there a way I can only collect unique tweets?

results = []
pages = 2 
counts = 100

while True:        
    for tweet in tweepy.Cursor(api.search, q = keywords, since="2017-07-21", until="2017-07-27", count = counts, lang = language,monitor_rate_limit=True, wait_on_rate_limit=True).pages(pages):
        results.extend(tweet)


    results = set(results)
SFC
  • 733
  • 2
  • 11
  • 22
  • It is difficult to say without a compete example, but you are trying to hash a list, which is not allowed. You should instead try to put every member of the list in the set `a = set() for tweet in results: a.add(tweet)` – Srini Jul 27 '17 at 20:43
  • I tried the code `a = set() for tweet in results: a.add(tweet)` but i get an error: invalid syntax – SFC Jul 27 '17 at 20:55
  • if you pasted the line in directly you would have got a syntax error for sure, did you try it on separate lines with the correct indentation? Also: Please provide errors you encounter while debugging to help us better solve your problem – Srini Jul 27 '17 at 21:27

1 Answers1

0

It is difficult to say for sure without a concrete example

{ ~ }  » python                                                                                                                            
>>> results = ["hi", "hello", "hi", "goodbye"]
>>> a = set()
>>> for tweet in results:
...     a.add(tweet)
...
>>> print a
set(['hi', 'hello', 'goodbye'])
>>>

as you can see above the set has only 1 "hi", you shouldn't try to hash the entire list as a whole.

Ok, as per your comments I did a littler reverse engineering, I determined that the tweets have a text field that you need to add to the set,

so just replace a.add(tweet) with a.add(tweet.text)

Srini
  • 1,619
  • 1
  • 19
  • 34
  • thanks for the suggestion. I tried the code per your example with the proper syntax and indentations. However, I get still get the following error using the `tweepy.Cursor` code I provided above. `TypeError: unhashable type: 'Status'` – SFC Jul 27 '17 at 22:14