I'm doing web scraping using selenium webdriver in Python with Proxy.
I want to browse more than 10k pages of single site using this scraping.
Issue is using this proxy I'm able to send request for single time only. when I'm sending another request on same link or another link of this site, I'm getting 416 error (kind of block IP using firewall) for 1-2 hours.
Note: I'm able to do scraping all normal sites with this code, but this site has kind of security which is prevent me for scraping.
Here is code.
profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference(
"network.proxy.http", "74.73.148.42")
profile.set_preference("network.proxy.http_port", 3128)
profile.update_preferences()
browser = webdriver.Firefox(firefox_profile=profile)
browser.get('http://www.example.com/')
time.sleep(5)
element = browser.find_elements_by_css_selector(
'.well-sm:not(.mbn) .row .col-md-4 ul .fs-small a')
for ele in element:
print ele.get_attribute('href')
browser.quit()
Any solution ??