0

I have some questions related to doing the loop with Selenium in Python. In fact, I want to iterate a list of links tracked by 'driver.find_elements_by_id' and click on them one by one, but the problem is such that each time I click on the link ('linklist' in the code), the page is refreshed so there is an error message indicating that 'Message: The element reference is stale. Either the element is no longer attached to the DOM or the page has been refreshed.'

I know that the reason is because the list of links disappeared after the click. But how can I generally in Selenium iterate the list even though the page doesn't exist anymore. I used 'driver.back()' and apparently it doesn't work.

The error message pops up after this line in the code:

link.click()  

the linklist is located in this URL (I want to clink on the button Document and then download the first file after the refreshed page is displayed) 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001467373&type=10-K&dateb=20101231&owner=exclude&count=40'

Can someone have a look at this problem? Thank you!

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import unittest
import os
import time
from bs4 import BeautifulSoup
from selenium.webdriver.common.keys import Keys
import requests
import html2text



class LoginTest(unittest.TestCase):
 def setUp(self):


    self.driver=webdriver.Firefox()
    self.driver.get("https://www.sec.gov/edgar/searchedgar/companysearch.html")


 def test_Login(self):
    driver=self.driver

    cikID="cik"
    searchButtonID="cik_find"
    typeID="//*[@id='type']"
    priorID="prior_to"
    cik="00001467373"
    Type="10-K"
    prior="20101231"
    search2button="//*[@id='contentDiv']/div[2]/form/table/tbody/tr/td[6]/input[1]"


    documentsbuttonid="documentsbutton"
    formbuttonxpath='//a[text()="d10k.htm"]'


    cikElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_id(cikID))

    cikElement.clear()
    cikElement.send_keys(cik)


    searchButtonElement=WebDriverWait(driver,20).until(lambda driver:driver.find_element_by_id(searchButtonID))
    searchButtonElement.click()

    typeElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(typeID))
    typeElement.clear()
    typeElement.send_keys(Type)
    priorElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_id(priorID))
    priorElement.clear()
    priorElement.send_keys(prior)
    search2Element=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(search2button))
    search2Element.send_keys(Keys.SPACE)
    time.sleep(1)

    documentsButtonElement=WebDriverWait(driver,20).until(lambda driver:driver.find_element_by_id(documentsbuttonid))
    a=driver.current_url



    window_be1 = driver.window_handles[0]
    linklist=driver.find_elements_by_id(documentsbuttonid)


    with open("D:/doc2/"+"a"+".txt", mode="w",errors="ignore") as newfile:


        for link in linklist:

                link.click()            

                formElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(formbuttonxpath))
                formElement.click()
                time.sleep(1)

                t=driver.current_url

                r = requests.get(t)
                data = r.text

                newfile.write(html2text.html2text(data))

                drive.back()
                drive.back()


 def terdown(self):
    self.driver.quit()
if __name__=='__main__':
 unittest.main()
SXC88
  • 227
  • 1
  • 5
  • 16
  • Not sure if this is the problem, but you're using `drive.back()` in the `for` loop instead of `driver.back()` – Farhan.K Nov 17 '16 at 10:53

1 Answers1

3

You should not use a list of web-elements, but a list of links. Try something like this:

linklist = []
for link in driver.find_elements_by_xpath('//h4[@class="title"]/a'):
    linklist.append(link.get_attribute('href'))

And then you can iterate through list of links

for link in linklist:
    driver.get(link)
    # do some actions on page

If you want to physically click on each link, you might need to use

for link in linklist:
    driver.find_element_by_xpath('//h4[@class="title"]/a[@href=%s]' % link).click()
    # do some actions on page
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • maybe you would like to have a look at this post @Andersson...http://stackoverflow.com/questions/40748555/python-threading-timer-set-time-limit-when-program-runs-out-of-time?noredirect=1#comment68724872_40748555 – SXC88 Nov 22 '16 at 20:09