I was trying to build a crawler for Flipkart using python and bs4. I was successful in doing so as well, but there is a problem while trying to scrape more than 13-14 pages of review. Till 13 pages the scraper works fine, but once I reach page no. 14, it says something is not right and page breaks (nothing appears on the page, but only a message which says something is not right). Check out the screenshot below:
So while trying to figure out if there is any pattern here, I kept on refreshing the page various times and found sometimes the data appeared maybe after 5 refreshes or after 30 refreshes (there was no fixed patter) and accordingly I wrote this part of the code to handle the situation:
for count in range(1,6521):
nav_btns = browser.find_elements_by_class_name('_33m_Yg')
button = ""
for btn in nav_btns:
number = int(btn.text)
if(number==count):
button = btn
break
try :
button.send_keys(Keys.RETURN)
except Exception as e :
break
##Handling the exception cases. [Something is not right.]
isTImedout = True
while isTImedout:
try:
WebDriverWait(browser, timeout=10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "_2xg6Ul")))
isTImedout = False
print("Scraping... %d" % page)
except TimeoutException as ex:
browser.refresh()
isTImedout = True
It worked fine the last time, and I was successful in scraping more than 100 pages. But today when I was trying to use the code again, the code just kept of refreshing the page no 14 and no data came on that page at all, I tried to refresh other pages after page no 14 various times but nothing is appearing at all. Here is the page I am trying to scrape.
I would like to understand what sort of problem is this and how can I deal with this.