I am trying to scrape all the "Following" account information (Username, Website, Last Tweet Date) of a certain account. For example https://www.twitter.com/verified/following. As you may see, it has 365.7K Following usernames.
I scraped the usernames and now I have to go to all the links and scrape that data. The code works fine, it gets all the information needed, but after a certain number of link visits, Twitter says I exceeded the Rate Limit and it stops showing any information about the account I visit.
def get_user_info(user):
"""Gets User Info - Username, Website, Last Tweet Date"""
driver.get(user[0])
sleep(1)
username = '@' + user[0].split('/')[-1]
attempt = 0
while True:
try:
website = driver.find_element_by_xpath("//div[@data-testid='UserProfileHeader_Items']/a").get_attribute('href')
except NoSuchElementException:
website = 'No Website'
attempt += 1
sleep(1)
try:
last_tweet_date = driver.find_element_by_xpath("//time").get_attribute('datetime')
except NoSuchElementException:
last_tweet_date = 'No Tweets'
attempt += 1
sleep(1)
if website != 'No Website' and last_tweet_date != 'No Tweets':
break
if attempt > 1:
break
info = (username, website, last_tweet_date)
return info
def user_info():
info_list = []
users_df = pd.read_csv('UserLinks.csv')
user_list = users_df.values.tolist()
for user in user_list:
info = get_user_info(user)
info_list.append(info)
info_df = pd.DataFrame(info_list, columns=['Username', 'Website', 'Last Tweet Date'])
info_df.to_csv('List2.csv', index=False)
What do you suggest?