I am currently web scraping an online database with selenium
in Python. The format of the database requires navigating between pages in order to scrape the data I am interested in, and every time I run the code, I invariably run into a 502 Bad Gateway Error (picture below).
This error message seems to go away sometimes, but it seems to depend on where in the loop this 502 pops up. Any advice on how to avoid this would be greatly appreciated. I also have attached the portion of my code which interacts with Chrome below for reference:
# ! Final !
#### Define Driver & Starting URL ####
# Location of chromedriver
driver_path = "/Users/shrey/Desktop/Python Projects/Selenium/chromedriver"
# Beginning url & initialize driver
url = "https://tamu.libguides.com/az.php"
driver = webdriver.Chrome()
# Make driver wait for elements to load when find_element() is run for the rest of our code
driver.implicitly_wait(10)
# Launch driver
driver.get(url)
# Press "Ancestry Database" link
driver.find_element(By.LINK_TEXT,
"Ancestry Library").click()
# Give time for user to login to database
time.sleep(30)
# Go to link where we can search from
home = "https://www.ancestrylibrary.com/search/collections/1742/"
driver.get(home)
# Switch to first tab (Search tab we just opened)
driver.switch_to.window(driver.window_handles[0])
#### Loop through each year present in the data ####
for yr in range(1886, 1952):
# Go to search home
driver.get(home)
# Find textbox & Input Year --------
year_input = driver.find_element(By.CSS_SELECTOR, "#sfs_SelfCivilYear")
year_input.send_keys(str(yr))
# Press "search" button
driver.find_element(By.CSS_SELECTOR, "#searchButton").click()
# Determine number of times we need to loop --------
# Find text which includes total number of results (formatted as "Results 1–20 of 1,351")
n_raw = driver.find_element(By.XPATH,
'//*[@id="results-header"]/h3').text
# Isolate the important number (1,351)
n_num = (tot_results.split()[-1]) # pulls the last word from the string - our desired number
# Remove comma and convert to number ("1,351" >>> 1351)
n_total = int(re.sub(",", "", n_num))
# Determine number of loops we need to do to scrape all the data
loop_count = math.floor(n_total/20) + 1
# Loop thru pages and collect links --------
# Init empty list
links = []
# Loop n times (calc'd earlier)
for i in range(loop_count):
# If we are on our last iter, do the same but do not click "next page" button
if i == range(loop_count)[-1]:
# Find & Store all "View Result" links
current_pg_links = driver.find_elements(By.CSS_SELECTOR,
".srchFoundDB a")
# Loop through all links pulled & append
for link in current_pg_links:
# Get actual url from 'href' attribute
url = link.get_attribute('href')
# Append URL to final list
links.append(url)
else:
# Find & Store all "View Result" links
current_pg_links = driver.find_elements(By.CSS_SELECTOR,
".srchFoundDB a")
for link in current_pg_links:
# Get actual url from 'href' attribute
url = link.get_attribute('href')
# Append URL to final list
links.append(url)
# Press "next page" button
driver.find_element(By.CSS_SELECTOR,
"a.ancBtn.sml.green.icon.iconArrowRight").click()