Using Selenium+python to extract HTML code from a list of links

Question

I'm trying to get several page sources from a list of links. My idea is to use a webdriver to open a link, save the page source in a variable and then go back to continue with the next link. But for some reason python tells me that an element is not attached to the page document. Is there any solution for this? Thanks in advance!

browserFut = webdriver.Chrome(PATH)
browserFut.get(link)

page_sources = []
links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
for link in links:
    link.click()
    page_sources += [browserFut.page_source]
    browserFut.back()
    time.sleep(1)

What happens when the first `link.click()` get triggered, does it redirect you anywhere ? — cruisepandey, Sep 06 '21 at 07:06
With that code, the driver completes one loop, I mean, it opens the first link, it saves it and then goes back. After that it sends back the message I wrote before — Juan José Campos, Sep 06 '21 at 12:56
Check out the below answer, and let me know if you face any issue. — cruisepandey, Sep 06 '21 at 13:01

pmadhu · Answer 1 · 2021-09-06T13:56:55.463

0

We need to assign links again in the for loop. Try something like this.

length = len(browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a'))
for i in range(length):
    links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
    links[i].click()
    page_sources += [browserFut.page_source]
    browserFut.back()
    time.sleep(1)

After click if the link is opening in a new tab:

length = len(browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a'))
for i in range(length):
    links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
    links[i].click()
    handles = driver.window_handles
    browserFut.switch_to.window((handles[1]))
    page_sources += [browserFut.page_source]
    browserFut.close()
    browserFut.switch_to.window(handles[0])
    time.sleep(1)

edited Sep 06 '21 at 13:56

answered Sep 06 '21 at 01:27

pmadhu

3,373
2
11
23

Refer this [link](https://stackoverflow.com/a/69043005/16452840). – pmadhu Sep 06 '21 at 01:32
It seems to have the same problem. There must be something happening inside the loop that I don't understand at all. It sends back this message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@class="dml-page-loader dml-page-loader--pb"]"} – Juan José Campos Sep 06 '21 at 13:03
@Juan José Campos - Try this xpath once `links = browserFut.find_elements_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]//a')` – pmadhu Sep 06 '21 at 13:17
Done, but nothing apparently changed – Juan José Campos Sep 06 '21 at 13:24
@Juan José Campos - Is it possible to share the URL. When you click on a link, does it open in new tab or within the same tab. If it opens in a new tab, it's completely a different scenario. – pmadhu Sep 06 '21 at 13:34
It opens in the same tab – Juan José Campos Sep 06 '21 at 16:12

score 0 · Accepted Answer · answered Sep 06 '21 at 13:00

0

In second loop, elements becomes stale so you have to define them again.

j = 0 
links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
for link in range(len(links)):
    elements = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a')
    elements[j].click()
    page_sources += [browserFut.page_source]
    j = j +1 
    browserFut.back()
    time.sleep(5)

answered Sep 06 '21 at 13:00

cruisepandey

28,520
6
20
38

The same problem again. But I think that I just figured it out. When I run the program it does the first loop as I said. Then it gets stuck, but I can still interact with the page. When I click manually the button to go back, the page is in blank and then I click the button to go forward and the initial page (the one from I take the links) charges. I'm going to try by saving the initial page's link and clicking it instead of going back – Juan José Campos Sep 06 '21 at 13:12
use `browserFut.execute_script("window.history.go(-1)")` instead of `browserFut.back()`. Also I think saving the links at first place make sense cause you won't loose any one of them in any way. – cruisepandey Sep 06 '21 at 13:26

score 0 · Answer 3 · edited Sep 07 '21 at 00:42

0

Credit to the users who advised me. At last, something worked. The only difference is the links I'm taking. It seems that I was taking some extra links that sent me to the same page.

page_sources = []
l = len(browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_tag_name('a'))
for x in range(l):
    links = browserFut.find_element_by_xpath('//div[@class="dml-page-loader dml-page-loader--pb"]').find_elements_by_xpath('//div[@class="sp-o-market__title"]//a')
    links[x].click()
    page_sources += [browserFut.page_source]
    time.sleep(1)
    browserFut.back()
    time.sleep(1)

edited Sep 07 '21 at 00:42

bad_coder

11,289
20
44
72

answered Sep 06 '21 at 13:41

Juan José Campos

33
6

1

The only difference is in the xpath. As I said, I was taking some extra links, most of them were duplicated and that is why the loop was failing I think. I don't really understand why but adding "//a" at the end of the xpath and deleting those extra link make the code works. But thank you both anyway, the idea of selecting the items with [x] instead of iterating the links was helpful. – Juan José Campos Sep 06 '21 at 16:14

Using Selenium+python to extract HTML code from a list of links

3 Answers3