Twitter API v2, how can I set a maximum on the number of tweets I want to scarpe

Question

I would like to scrape only 2.000 tweets each day related to a specific query (in this example it's tesla). Do you guys know a way to set a maximum to the number of tweets I can scrape? This is my code below without the access keys to my Academic Twitter API account. It works perfectly, however it keeps scraping all the tweets that are out there which results in me reaching the 10 million maximum monthly tweets I can scrape very quickly.

Thank you in advance!

client = tweepy.Client(
wait_on_rate_limit = True,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_token_secret = access_token_secret,
bearer_token = my_bearer_token,

)

query="climate change lang:en -is:retweet"
start_time = "2011-01-01T00:00:00Z" 
end_time = "2011-06-30T23:59:59Z"

response_tweets = []
for response in tweepy.Paginator(client.search_all_tweets,
query=query, 
user_fields = ["username", 'public_metrics'],
tweet_fields=['created_at', 'text'],
expansions = ['author_id'],
start_time=start_time,
end_time=end_time, 
max_results=500):

time.sleep(1)
response_tweets.append(response)

Shanazar · Answer 1 · 2022-09-26T21:36:02.570

When using tweepy you can give the total number of tweets to retrieve by using flatten(limit) method. So, in your case it will be smth like:

paginator = tweepy.Paginator(client.search_all_tweets,
                   query=query, 
                   user_fields = ["username", 'public_metrics'],
                   tweet_fields=['created_at', 'text'],
                   expansions = ['author_id'],
                   start_time=start_time,
                   end_time=end_time, 
                   max_results=500)

for response in paginator.flatten(limit=max_limit):
    response_tweets.append(result)

More information about tweepy and pagination can be found here : Pagination-tweepy

Twitter API v2, how can I set a maximum on the number of tweets I want to scarpe

1 Answers1