1

I have been trying to scrap all followers from one of my business page which are roughly 50k. I used Selenium web driver and Python to scrap.

I am able to get followers dialogue box and I am able to scroll that dialog box to load more followers. However, the scrolling speed keeps on decreasing as more followers are being loaded in the dialog box.

This technique can work but it will take days. This also requires the machine to be active all of the time & not sleep otherwise the process will stop. And, most of the times it gives an error after 3k to 4k followers scraping.

I was wondering if there is any problem with my script and way of scraping followers or it is usual. And, if there is an efficient way of doing this maybe?

followers_dialoge = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div[2]")
n = 1
for i in range(int(allfoll / n)):
     next_length = len(driver.find_elements_by_class_name('FPmhX'))
     if next_length != prev_length:
         new_followers = driver.find_elements_by_class_name('FPmhX')[-12:]


          with open(followers_dir, "a") as followers_file:

              for element in new_followers:
                   if element.get_property('href'):
                       title = element.get_property('title')
                       href = element.get_property('href')
                       followers_file.write(title + "," + href + "," + "\n")

During scrolling, every time it loads 12 more followers in the dialog box so in 5th line, I get new 12 followers and save them. I know I can wait for the complete dialog box to load and I can save all 50k once, but as it is prone to stop after a couple of minutes/hours that is why I try to save them during the process. (This can be one reason why it is slow)

Muhammad Zeeshan
  • 470
  • 7
  • 24
  • 1
    haha I just ran into the same the problem.. i think slowdown caused by selenium, as more followers are being loaded it gets slower.. i might be wrong though – sql-noob Nov 21 '18 at 17:35
  • I think it is because of the capacity of the browser. The browser needs to handle a lot of data which makes it slow to respond so does selenium cannot work faster. Have you found any alternative or solution? – Muhammad Zeeshan Nov 22 '18 at 13:43
  • thats right I meant the browser not selenium.. i've found a solution, see my answer – sql-noob Nov 24 '18 at 18:26

1 Answers1

0

You need to use query_hash and end_cursor value to query for next list of followers. Open up firefox, click on Followers list of a user, click on Inspect element, switch to Network tab and filter by XHR and start scrolling down, you will see requests that Instagram making to get the next list of followers. This thread helped to get started: https://stackoverflow.com/a/50058700/1890619

sql-noob
  • 383
  • 6
  • 16