Extracting links from website with selenium bs4 and python

Question

Okay so.

The heading might seem like this question has already been asked but I had no luck finding an answer for it.

I need help with making link extracting program with python.

Actually It works. It finds all <a> elements on a webpage. Takes their href="" and puts it in an array. Then it exports it in csv file. Which is what I want.

But I can't get a hold of one thing.

The website is dynamic so I am using the Selenium webdriver to get JavaScript results.

The code for the program is pretty simple. I open a website with webdriver and then get its content. Then I get all links with

results = driver.find_elements_by_tag_name('a')

Then I loop through results with for loop and get href with

result.get_attribute("href")

I store results in an array and then print them out.

But the problem is that I can't get the name of the links.

<a href="https://www.google.com">This leads to Google</a>

Is there any way to get 'This leads to Google' string.

I need it for every link that is stored in an array.

Thank you for your time

UPDATE!!!!!

As it seems it only gets dynamic links. I just notice this. This is really strange now. For hard coded items, it returns an empty string. For a dynamic link, it returns its name.

I already tried that. But for some reason it returns a lot of empty strings but when I go in source code of website code exists. It also returns empty string for text that is not dyamic. I am testing it on website I made if you are wondering how I know that some data is not dynamic. It returns few strings but only like 20% of all. — Mileta Dulovic, Jul 20 '19 at 16:55
Use result.get_attribute(‘innerHTML’) or result.get_attribute(‘textContent’) .Change the quotes it is tying by mobile device. — KunduK, Jul 20 '19 at 17:08
Thank you @KunduK. You saved me. I posted an answer to this thread. — Mileta Dulovic, Jul 20 '19 at 17:15

score 0 · Accepted Answer · answered Jul 20 '19 at 17:14

0

Okay. So. The answer is that instad of using .text you shoud use get_attribute("textContent"). Works better than get_attribute("innerHTML")

Thanks KunduK for this answer. You saved my day :)

answered Jul 20 '19 at 17:14

Mileta Dulovic

1,036
1
14
33

Extracting links from website with selenium bs4 and python

1 Answers1