2

I was trying to build a crawler for Flipkart using python and bs4. I was successful in doing so as well, but there is a problem while trying to scrape more than 13-14 pages of review. Till 13 pages the scraper works fine, but once I reach page no. 14, it says something is not right and page breaks (nothing appears on the page, but only a message which says something is not right). Check out the screenshot below:


enter image description here

So while trying to figure out if there is any pattern here, I kept on refreshing the page various times and found sometimes the data appeared maybe after 5 refreshes or after 30 refreshes (there was no fixed patter) and accordingly I wrote this part of the code to handle the situation:


for count in range(1,6521):
    nav_btns = browser.find_elements_by_class_name('_33m_Yg')

    button = ""

    for btn in nav_btns:
        number = int(btn.text)
        if(number==count):
            button = btn
            break

    try :
        button.send_keys(Keys.RETURN)
    except Exception as e :
        break   

    ##Handling the exception cases. [Something is not right.]
    isTImedout = True
    while isTImedout:
        try:
            WebDriverWait(browser, timeout=10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "_2xg6Ul")))
            isTImedout = False
            print("Scraping... %d" % page)
        except TimeoutException as ex:
            browser.refresh()
            isTImedout = True

It worked fine the last time, and I was successful in scraping more than 100 pages. But today when I was trying to use the code again, the code just kept of refreshing the page no 14 and no data came on that page at all, I tried to refresh other pages after page no 14 various times but nothing is appearing at all. Here is the page I am trying to scrape.

I would like to understand what sort of problem is this and how can I deal with this.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Prateek
  • 185
  • 1
  • 3
  • 12

0 Answers0