how can I locate the highlighted element in the picture. I use selenium

Question

I am not sure why I can't locate this element, I am using selenium because the pages loads dynamically.

here is my code.

driver.get(singleData['itemLink'])
WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"section#description")))
srce = driver.page_source
sp = BeautifulSoup(srce, 'lxml')

I can get its parent element like this

down = sp.find(id = "attachments-links")

but could not find its a tag present in its first div. I tried :

down3 = sp.find("a", attrs={"class": "usa-button-small usa-button-gray ng-star-inserted"})
down = sp.select("#attachments-links>div.download-container-header>span>a")

none of them works fine and returns me [].

I can go to the h2 tag which is just above it. like this

down = sp.find(id = "attachments-links").find('div')

and printing down gives me :

<div class="download-container-header"><h2 id="opp-view-attachments-section-title">Attachments/Links</h2><!-- --></div>

link: https://beta.sam.gov/opp/8f1efc97df214010b46631c74e6a8aa0/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1

your help is much appreciated.

MendelG · Accepted Answer · 2020-10-06T07:09:20.003

The page first needs to be clicked on / scrolled in order to extract the correct information.

from selenium import webdriver
from bs4 import BeautifulSoup

URL = "https://beta.sam.gov/opp/8f1efc97df214010b46631c74e6a8aa0/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1"

driver = webdriver.Chrome()
driver.get(URL)
# Wait for the page to fully render
driver.implicitly_wait(5)

# Click on an element of the page
driver.find_element_by_css_selector("h2#opp-view-attachments-section-title").click()

soup = BeautifulSoup(driver.page_source, "lxml")

button = soup.find("a", attrs={"class": "usa-button-small usa-button-gray ng-star-inserted"})["href"]
print(button)

Output:

https://beta.sam.gov/api/prod/opps/v3/opportunities/8f1efc97df214010b46631c74e6a8aa0/resources/download/zip?api_key=null&token=

score 1 · Answer 2 · answered Oct 06 '20 at 07:13

1

You can construct the download link from the URL (without using selenium or beautifulsoup):

import re


url = 'https://beta.sam.gov/opp/8f1efc97df214010b46631c74e6a8aa0/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1'

opp_id = re.search(r'opp/([^/]+)', url).group(1)
download_url = 'https://beta.sam.gov/api/prod/opps/v3/opportunities/{opp_id}/resources/download/zip?api_key=null&token='.format(opp_id=opp_id)

print(download_url)

Prints:

https://beta.sam.gov/api/prod/opps/v3/opportunities/8f1efc97df214010b46631c74e6a8aa0/resources/download/zip?api_key=null&token=

answered Oct 06 '20 at 07:13

Andrej Kesely

168,389
15
48
91

hey Andrej Kesely, If I paste the provided link directly into the browser , or If I put the link as href value for an a tag. either of the way the link wont work – Talib Daryabi Oct 06 '20 at 08:55
@TalibDaryabi My script produces `https://beta.sam.gov/api/prod/opps/v3/opportunities/8f1efc97df214010b46631c74e6a8aa0/resources/download/zip?api_key=null&token=` Opening this link in Firefox I can download the ZIP file. – Andrej Kesely Oct 06 '20 at 08:59
1

@TalibDaryabi You cannot reach https://beta.sam.gov/api/prod/opps/v3/opportunities/8f1efc97df214010b46631c74e6a8aa0/resources/download/zip?api_key=null&token= ? If not, maybe the server blocks you... – Andrej Kesely Oct 06 '20 at 09:07
I have asked a question, can you help me with the solution https://stackoverflow.com/questions/64640225/the-urllib-request-return-me-an-empty-data-while-the-same-request-in-postman-re – Talib Daryabi Nov 02 '20 at 05:38

Justin Lambert · Answer 3 · 2020-10-06T06:22:38.360

0

Please use following xpath to click button //*[contains(text(),'Download All Attachments/Links')]

(copy that text from webpage) -----Download All Attachments/Links

edited Oct 06 '20 at 06:22

answered Oct 06 '20 at 06:21

Justin Lambert

940
1
7
13

i don't want it to be clicked, instead I just want to get its href value – Talib Daryabi Oct 06 '20 at 06:22
use getattribute() – Justin Lambert Oct 06 '20 at 06:23
i didn't get what you mean, you may please give me a sample code and its answer if worked. thank you. – Talib Daryabi Oct 06 '20 at 06:27
this gives me Unable to locate, down = driver.find_element_by_xpath("//*[@id='attachments-links']/div[1]/span/a[@href]") – Talib Daryabi Oct 06 '20 at 06:31

score 0 · Answer 4 · answered Oct 06 '20 at 06:46

0

Use the xpath :

   (“/span[contains(@class=‘download-button]/a[@class=‘usa-button-small usa-button-grey ng-star-inserted’]”)

And then getAttribute(“href”)

answered Oct 06 '20 at 06:46

YourHelper

703
7
17

score 0 · Answer 5 · answered Oct 06 '20 at 06:56

Just grab the element and print it's ahref attribute.

downloadUrl=WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, " div.download-container-header > span > a"))).get_attribute('href')
print(downloadUrl)

Import

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

score 0 · Answer 6 · answered Oct 06 '20 at 07:06

0

String hrefelement= driver.findElement(By.xpath("//*[contains(text(),'Download All Attachments/Links')]")).getAttribute("a");

then print that

answered Oct 06 '20 at 07:06

Justin Lambert

940
1
7
13

how can I locate the highlighted element in the picture. I use selenium

6 Answers6