1

set-up

I use Python + Selenium to scrape info of companies of this site.

Since the website doesn't allow me to simply load page urls, I plan to click on the next page arrow element at the bottom of the list and using a while loop with a counter.


the code

browser.get('https://new.abb.com/channel-partners/search#') 
wait.until(EC.visibility_of_element_located((By.CLASS_NAME,'abb-pagination')))

# start while loop and counter
c = 1
while c < 65:        
    c += 1

    # obtain list of companies element
    wait.until(EC.visibility_of_element_located((By.CLASS_NAME,'#PublicWrapper > main > section:nth-child(7) > div:nth-child(2)')))
    resultlist = el_css('#PublicWrapper > main > section:nth-child(7) > div:nth-child(2)') 

    # loop over companies in list
    for company in resultlist.find_elements_by_xpath('div'):
        
            # company name
            name = company.find_element_by_xpath('h3/a/span').text

            # code to capture more company info follows

    # next page arrow element 
    next_page_arrow = el_cn('abb-pagination__item--next')    
    next_page_arrow.click()    

issue

The code captures the company info just fine outside of the while loop, i.e. just the first page.

However, when inserted in the while loop to iterate over the pages, I get the following error: StaleElementReferenceException: stale element reference: element is not attached to the page document (Session info: chrome=88.0.4324.192)

If I go over it, it seems resultlist of the subsequent page does get captured, but the loop over companies in resultlist yields this error.

What to do?

LucSpan
  • 1,831
  • 6
  • 31
  • 66

1 Answers1

0

the simplest solution would be to use an implicity wait:

driver.get('https://new.abb.com/channel-partners/search#') 

company_name = []
while True:
    time.sleep(1)    
    company_name+=[elem.text for elem in wait.until(EC.presence_of_all_elements_located((By.XPATH,'//span[@property="name"]')))]
    # if next page arrow element still available, click, else break while
    if driver.find_elements_by_xpath('//li[@class="abb-pagination__item--next"]/a[contains(@href,"#page")]'):
        wait.until(EC.presence_of_element_located((By.XPATH,'//li[@class="abb-pagination__item--next"]/a'))).click()
    else:
        break

len(company_name)

output:

951

You don't need the counter, you can check if arrow url is still available, this way if a page 65, 66, [...] were added, your logic would still work.

The problem here is that the while is too fast, and the page does not load in time. You could alternatively save the first list of company names, click in the next arrow and compare with the new list, if both were the same, wait a little more until the new list is differente from the previous one.

Everton Reis
  • 422
  • 4
  • 16