0

You can only retrieve 100 user objects per request with the api.lookup_users() method. Is there an easy way to retrieve more than 100 using Tweepy and Python? I have read this post: User ID to Username tweepy but it does not help with the more than 100 problem. I am pretty novice in Python so I cannot come up with a solution myself. What I have tried is this:

users = []
i = 0
num_pages = 2
while i < num_pages:
    try:
        # Look up a collection of ids
        users.append(api.lookup_users(user_ids=ids[100*i:100*(i+1)-1]))
    except tweepy.TweepError:
        # We get a tweep error
        print('Something went wrong, quitting...')
    i = i + 1

where ids is a list containing the ids, but I get IndexError: list index out of range when I try to get a user with index higher than 100. If it helps I am only interested in getting the screen names from the user ids.

Community
  • 1
  • 1
joakimj
  • 79
  • 1
  • 10

2 Answers2

0

I haven't tested it since I don't have access to the API.
But if you have a collection of user ids in any range, this should fetch all of them.

It fetches any remainder first, meaning if you have a list of 250 ids, it will fetch 50 users with the last 50 ids in the list.
Then it will fetch the remaining 200 users in batches of hundreds.

from tweepy import api, TweepError

users = []
user_ids = []  # collection of user ids
count_100 = int(len(user_ids) / 100)  # amount of hundred user ids

if len(user_ids) % 100 > 0:
    for i in range(0, count_100 + 1):
        try:
            if i == 0:
                remainder = len(user_ids) % 100
                users.append(api.lookup_users(user_ids=user_ids[:-remainder]))
            else:
                end_at = i * 100
                start_at = end_at - 100
                users.append(api.lookup_users(user_ids=user_ids[start_at:end_at]))

         except TweepError:
             print('Something went wrong, quitting...')
Cicero
  • 2,872
  • 3
  • 21
  • 31
0

You're right that you need to send the tweets to the API in batches of 100, but you're ignoring the fact that you might not have an exact multiple of 100 tweets. Try the following:

import tweepy

def lookup_user_list(user_id_list, api):
    full_users = []
    users_count = len(user_id_list)
    try:
        for i in range((users_count / 100) + 1):
            full_users.extend(api.lookup_users(user_ids=user_id_list[i*100:min((i+1)*100, users_count)]))
        return full_users
    except tweepy.TweepError:
        print 'Something went wrong, quitting...'

results = lookup_user_list(ids, api)

By taking the minimum of results = lookup_user_list(user_ids, main_api) we ensure the final loop only gets the users left over. results will be a list of the looked-up users.

You may also hit rate limits - when setting up your API, you should take care to let tweepy catch these gracefully and remove some of the hard work, like so:

consumer_key = 'X'
consumer_secret = 'X'
access_token = 'X'
access_token_secret = 'X'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
asongtoruin
  • 9,794
  • 3
  • 36
  • 47