Finding an element by partial href (Python Selenium)

Question

I'm trying to access text from elements that have different xpaths but very predictable href schemes across multiple pages in a web database. Here are some examples:

<a href="/mathscinet/search/mscdoc.html?code=65J22,(35R30,47A52,65J20,65R30,90C30)">
65J22 (35R30 47A52 65J20 65R30 90C30) </a>

In this example I would want to extract "65J22 (35R30 47A52 65J20 65R30 90C30)"

<a href="/mathscinet/search/mscdoc.html?code=05C80,(05C15)">
05C80 (05C15) </a>

In this example I would want to extract "05C80 (05C15)". My web scraper would not be able to search by xpath directly due to the xpaths of my desired elements changing between pages, so I am looking for a more roundabout approach.

My main idea is to use the fact that every href contains "/mathscinet/search/mscdoc.html?code=". Selenium can't directly search for hrefs, but I was thinking of doing something similar to this C# implementation:

Driver.Instance.FindElement(By.XPath("//a[contains(@href, 'long')]"))

To port this over to python, the only analogous method I could think of would be to use the in operator, but I am not sure how the syntax will work when everything is nested in a find_element_by_xpath. How would I bring all of these ideas together to obtain my desired text?

driver.find_element_by_xpath("//a['/mathscinet/search/mscdoc.html?code=' in @href]").text

Andrei Suvorkov · Accepted Answer · 2018-07-17T05:08:52.080

If I right understand you want to locate all elements, that have same partial href. You can use this:

elements = driver.find_elements_by_xpath("//a[contains(@href, '/mathscinet/search/mscdoc.html')]")
for element in elements:
    print(element.text)

or if you want to locate one element:

driver.find_element_by_xpath("//a[contains(@href, '/mathscinet/search/mscdoc.html')]").text

This will give a list of all elements located.

score 1 · Answer 2 · answered Jul 17 '18 at 06:52

As per the HTML you have shared @AndreiSuvorkov's answer would possibly cater to your current requirement. Perhaps you can get much more granular and construct an optimized xpath by:

Instead of using contains using starts-with
Include the ?code= part of the @href attribute

Your effective code block will be:

all_elements = driver.find_elements_by_xpath("//a[starts-with(@href,'/mathscinet/search/mscdoc.html?code=')]")
for elem in all_elements:
    print(elem.get_attribute("innerHTML"))

Finding an element by partial href (Python Selenium)

2 Answers2