1

How to get tweets on certain dates using Tweepy

The code I wrote is like this(jupyter):

import tweepy as tw 
import xlsxwriter
import datetime 
import pandas as pd
consumer_key="#"
consumer_secret="#"
access_key="#"
access_secret="#"
try:
 auth = tw.OAuthHandler(consumer_key, consumer_secret)
 auth.set_access_token(access_key, access_secret)
 auth.get_authorization_url()
 api = tw.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True,compression=True,retry_count=3,retry_delay=10,timeout=15)
except tw.TweepError:
 print ('Error')

name="mahfiegilmez"

startDate = datetime.datetime(2018, 6, 24, 0, 0, 0)
endDate =   datetime.datetime(2018, 12, 31, 23, 59, 59)

say=0
tweets = []
from time import sleep
tmpTweets = api.user_timeline(name,count=200,tweet_mode="extended",lang="tr")

for tweet in tmpTweets:
        if tweet.created_at < endDate and tweet.created_at > startDate:
            tweets.append(tweet)

lastTweet = tmpTweets[-1].id
while (tmpTweets[-1].created_at > startDate):
    print("Sonraki Tweet @", tmpTweets[-1].created_at,say)

    tmpTweets = api.user_timeline(name,max_id = tmpTweets[-1].id,tweet_mode="extended")
    if lastTweet == tmpTweets[-1].id:
        print("lastTweet")
        sleep(15)
    else:
        for tweet in tmpTweets:
            if tweet.created_at < endDate and tweet.created_at > startDate:
                tweets.append(tweet)
    lastTweet = tmpTweets[-1].id
    say+=1

next section:

tweets2=[]
tweets.reverse()
for x in tweets:
    if(x.in_reply_to_status_id==None) or (x.in_reply_to_screen_name==name):
        if (not x.retweeted) and ("RT @" not in x.full_text):
            tweets2.append(x)

Like:

  • Next Tweet @ 2019-02-15 13:33:26 1095106703098605568 157
  • Next Tweet @ 2019-02-11 23:45:58 1094442196500209666 158
  • Next Tweet @ 2019-02-10 03:45:28 1094441678889463809 159
  • Next Tweet @ 2019-02-10 03:43:24 1094441678889463809 160
  • Next Tweet @ 2019-02-10 03:43:24 1094441678889463809 161
  • Next Tweet @ 2019-02-10 03:43:24 1094441678889463809 162
  • Next Tweet @ 2019-02-10 03:43:24 1094441678889463809 163 .....

How am I gonna solve this?

And finally gives this error.

> IndexError                                Traceback (most recent call
> last) <ipython-input-9-46264abdd8ef> in <module>
>       9         tweets.append(tweet)
>      10 
> ---> 11 while (tmpTweets[-1].created_at > startDate):
>      12     print("Last Tweet @", tmpTweets[-1].created_at, " - fetching some more")
>      13     tmpTweets = api.user_timeline(username, max_id = tmpTweets[-1].id)
> 
> IndexError: list index out of range
Mehmet D.
  • 11
  • 2
  • The reason might be that `API.user_timeline` returns only the 20 most recent statuses. https://tweepy.readthedocs.io/en/latest/api.html#API.user_timeline – Kaymal Nov 16 '19 at 22:07
  • I've already used the solution in the link below to overcome this problem. but this time gives the above error. https://stackoverflow.com/questions/49731259/tweepy-get-tweets-among-two-dates – Mehmet D. Nov 16 '19 at 22:59
  • The problem is still active. Is there anyone that can help ?? – Mehmet D. Nov 25 '19 at 16:55

1 Answers1

1

A better way to do this would be to use the since_id and max_id parameters for the API.user_timeline method / GET statuses/user_timeline endpoint, instead of making lots of unnecessary requests to loop through tons of Tweets outside the time range. You should also look into using a Cursor instead.

The error you're encountering is likely because the account in question has more than 3200 Tweets since the time specified with startDate.

This method can only return up to 3,200 of a user's most recent Tweets.

https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline

So once it's gone through the most recent 3200 Tweets, the next call to the method/endpoint will assign an empty list to tmpTweets. It would then error when you attempt to index tmpTweets in the print statement. Your traceback seems to have different code than your snippet. If that print statement isn't there, then it would error when you attempt to index tmpTweets in the while condition, as in your traceback.

Harmon758
  • 5,084
  • 3
  • 22
  • 39
  • Hello there, first of all, thank you for responding. I rearranged the subject I would appreciate if you take a look. 3200 tweet limit because I know I'm dealing.This code can be adapted to different how? **Could you explain with examples?** (Meanwhile, I couldn't find detailed examples of the **cursor**) – Mehmet D. Nov 27 '19 at 00:22
  • This is still going to error in the same way in your `if` statement, since `tmpTweets` will be an empty list rather than the same as the previous request, since your `max_id` is going to be past the limit. The sleep also wouldn't do anything to mitigate the issue. You can simply check if the list is empty and break out of the loop if so, e.g. `if not tmpTweets: break`. You also have an indentation error where you assign `name`. The documentation for `Cursor` that I linked has examples: https://tweepy.readthedocs.io/en/latest/cursor_tutorial.html – Harmon758 Nov 27 '19 at 05:11