The following code will return all a
tags containing a span with class text
, as from what I could see in page, all links with that particular data-normalized-text
attribute have. The setup is for linux, however you can adapt the code to your own, just observe the imports and the code after defining the browser/driver:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'https://www.dnca-investments.com/documents'
browser.get(url)
elems = WebDriverWait(browser,10).until(EC.presence_of_all_elements_located((By.XPATH, "//span[@class='text']/parent::a")))
print('Total links:', len(elems))
for elem in elems:
print(len(elems))
print(elem.get_attribute('outerHTML'))
This will return:
Total links: 1205
<a tabindex="0" class="" data-normalized-text="<span class="text">LU1791428052 (Part H-I (CHF))</span>" data-tokens="null"><span class="text">LU1791428052 (Part H-I (CHF))</span><span class="glyphicon glyphicon-ok check-mark"></span></a>
<a tabindex="0" class="" data-normalized-text="<span class="text">LU1694789535 (Part B)</span>" data-tokens="null"><span class="text">LU1694789535 (Part B)</span><span class="glyphicon glyphicon-ok check-mark"></span></a>
<a tabindex="0" class="" data-normalized-text="<span class="text">LU1694789451 (Part A)</span>" data-tokens="null"><span class="text">LU1694789451 (Part A)</span><span class="glyphicon glyphicon-ok check-mark"></span></a>
<a tabindex="0" class="" data-normalized-text="<span class="text">LU1694789378 (Part I)</span>" data-tokens="null"><span class="text">LU1694789378 (Part I)</span><span class="glyphicon glyphicon-ok check-mark"></span></a>
[...]
Note you can drill down to further ancestors, and then return and grab the links you want, depending on the category etc. Selenium documentation can be found at https://www.selenium.dev/documentation/