Disclaimer: This is my first foray into web scraping
I have a list of ~400 search results URLs that I am trying to loop through using Selenium to collect information. At a certain point, I am redirected and presented with the following text:
"Your access to VINELink.com has been declined due to higher than normal utilization levels... You are attempting to access this website from the following ip address. Please make sure your firewall settings are not restricting access. [MY IP ADDRESS]"
Is there a way to generate a list of valid random IP addresses, select one randomly within a loop and feed it to the Selenium WebDriver to avoid being blocked?
I understand that there are ethical considerations to this question (in reality, I've contacted the site to explain my benign use case and ask if they can unblock my real IP address); I'm mostly just interested if this is something one could do.
Abbreviated list of URLs:
['http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662',
'http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=A21069',
'http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=B59293',
...]
Abbreviated code for loop (missing the actual list of valid IP addresses):
info = {}
for url in detail_urls:
proxy = ### SELECT RANDOM IP ADDRESS FROM A LIST OF VALID IP ADDRESSES ###
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server='+str(proxy))
driver = webdriver.Chrome(executable_path='/PATH/chromedriver', options=chrome_options)
driver.get(url)
driver.implicitly_wait(3)
if drive.find_element_by_xpath('//*[@id="ngVewDiv"]/div/div/div/div[3]/div[3]/div[2]/div/search-result/div/div[4]/div[1]/more-info/div[1]/button'):
button = driver.find_element_by_xpath('//*[@id="ngVewDiv"]/div/div/div/div[3]/div[3]/div[2]/div/search-result/div/div[4]/div[1]/more-info/div[1]/button').click()
name = driver.find_element_by_xpath('//*[@id="ngVewDiv"]/div/div/div/div[3]/div[3]/div[2]/div/search-result/div/div[1]/div/div[1]/span[1]/span[1]/div/div/div[2]/span')
name = name.text
offenderid = driver.find_element_by_xpath('//*[@id="ngVewDiv"]/div/div/div/div[3]/div[3]/div[2]/div/search-result/div/div[4]/div[1]/more-info/div[2]/div/div/div[2]/div[1]/div/div[2]/span')
offenderid = offenderid.text
info[name] = [offenderid]
driver.close()
else:
driver.close()