1

I'm trying to get several page sources from a list of links. My idea is to use a webdriver to open a link, save the page source in a variable and then go back to continue with the next link. But for some reason python tells me that an element is not attached to the page document. Is there any solution for this? Thanks in advance!

browserFut = webdriver.Chrome(PATH)
browserFut.get(link)

page_sources = []
links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
for link in links:
    link.click()
    page_sources += [browserFut.page_source]
    browserFut.back()
    time.sleep(1)

3 Answers3

0

We need to assign links again in the for loop. Try something like this.

length = len(browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a'))
for i in range(length):
    links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
    links[i].click()
    page_sources += [browserFut.page_source]
    browserFut.back()
    time.sleep(1)

After click if the link is opening in a new tab:

length = len(browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a'))
for i in range(length):
    links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
    links[i].click()
    handles = driver.window_handles
    browserFut.switch_to.window((handles[1]))
    page_sources += [browserFut.page_source]
    browserFut.close()
    browserFut.switch_to.window(handles[0])
    time.sleep(1)
pmadhu
  • 3,373
  • 2
  • 11
  • 23
  • Refer this [link](https://stackoverflow.com/a/69043005/16452840). – pmadhu Sep 06 '21 at 01:32
  • It seems to have the same problem. There must be something happening inside the loop that I don't understand at all. It sends back this message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@class="dml-page-loader dml-page-loader--pb"]"} – Juan José Campos Sep 06 '21 at 13:03
  • @Juan José Campos - Try this xpath once `links = browserFut.find_elements_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]//a')` – pmadhu Sep 06 '21 at 13:17
  • Done, but nothing apparently changed – Juan José Campos Sep 06 '21 at 13:24
  • @Juan José Campos - Is it possible to share the URL. When you click on a link, does it open in new tab or within the same tab. If it opens in a new tab, it's completely a different scenario. – pmadhu Sep 06 '21 at 13:34
  • It opens in the same tab – Juan José Campos Sep 06 '21 at 16:12
0

In second loop, elements becomes stale so you have to define them again.

j = 0 
links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
for link in range(len(links)):
    elements = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
    elements[j].click()
    page_sources += [browserFut.page_source]
    j = j +1 
    browserFut.back()
    time.sleep(5)
cruisepandey
  • 28,520
  • 6
  • 20
  • 38
  • The same problem again. But I think that I just figured it out. When I run the program it does the first loop as I said. Then it gets stuck, but I can still interact with the page. When I click manually the button to go back, the page is in blank and then I click the button to go forward and the initial page (the one from I take the links) charges. I'm going to try by saving the initial page's link and clicking it instead of going back – Juan José Campos Sep 06 '21 at 13:12
  • use `browserFut.execute_script("window.history.go(-1)")` instead of `browserFut.back()`. Also I think saving the links at first place make sense cause you won't loose any one of them in any way. – cruisepandey Sep 06 '21 at 13:26
0

Credit to the users who advised me. At last, something worked. The only difference is the links I'm taking. It seems that I was taking some extra links that sent me to the same page.

page_sources = []
l = len(browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a'))
for x in range(l):
    links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_xpath('//div[@class="sp-o-market__title"]//a')
    links[x].click()
    page_sources += [browserFut.page_source]
    time.sleep(1)
    browserFut.back()
    time.sleep(1)
    
bad_coder
  • 11,289
  • 20
  • 44
  • 72
  • 1
    The only difference is in the xpath. As I said, I was taking some extra links, most of them were duplicated and that is why the loop was failing I think. I don't really understand why but adding "//a" at the end of the xpath and deleting those extra link make the code works. But thank you both anyway, the idea of selecting the items with [x] instead of iterating the links was helpful. – Juan José Campos Sep 06 '21 at 16:14