1

I am trying to scrape all the "Following" account information (Username, Website, Last Tweet Date) of a certain account. For example https://www.twitter.com/verified/following. As you may see, it has 365.7K Following usernames.

I scraped the usernames and now I have to go to all the links and scrape that data. The code works fine, it gets all the information needed, but after a certain number of link visits, Twitter says I exceeded the Rate Limit and it stops showing any information about the account I visit.

def get_user_info(user):
    """Gets User Info - Username, Website, Last Tweet Date"""
    driver.get(user[0])
    sleep(1)
    username = '@' + user[0].split('/')[-1]
    attempt = 0
    while True:
        try:
            website = driver.find_element_by_xpath("//div[@data-testid='UserProfileHeader_Items']/a").get_attribute('href')
        except NoSuchElementException:
            website = 'No Website'
            attempt += 1
            sleep(1)
        try:
            last_tweet_date = driver.find_element_by_xpath("//time").get_attribute('datetime')
        except NoSuchElementException:
            last_tweet_date = 'No Tweets'
            attempt += 1
            sleep(1)
        if website != 'No Website' and last_tweet_date != 'No Tweets':
            break
        if attempt > 1:
            break

    info = (username, website, last_tweet_date)
    return info

def user_info():
    info_list = []
    users_df = pd.read_csv('UserLinks.csv')
    user_list = users_df.values.tolist()
    for user in user_list:
        info = get_user_info(user)
        info_list.append(info)

    info_df = pd.DataFrame(info_list, columns=['Username', 'Website', 'Last Tweet Date'])
    info_df.to_csv('List2.csv', index=False)

What do you suggest?

Joe Mayo
  • 7,501
  • 7
  • 41
  • 60
  • Do you use twitter api? https://developer.twitter.com/en/docs/twitter-api/migrate – data_m Oct 22 '20 at 08:47
  • 1
    You'd have to of course abide by the allowed rate limit to be able to scrape without a suspension. – Lenin Oct 22 '20 at 08:53

1 Answers1

1

Here's my answer to a similar question on rate limits:

How Rate Limit Works in Twitter

Essentially, every API has a rate limit that renews in a certain timeframe. e.g. 15 minutes. So, you need to watch the rate limit headers or keep count yourself. When you get to the rate limit, pause your application and start again on the next rate limit window. Some APIs have a count parameter and you'll want to make sure you set that to max to get the most responses per request. Also, Application auth typically gets more requests than User auth, if it's available for a given API call.

Joe Mayo
  • 7,501
  • 7
  • 41
  • 60