3

I am not sure why I can't locate this element, I am using selenium because the pages loads dynamically.

here is my code.

driver.get(singleData['itemLink'])
WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"section#description")))
srce = driver.page_source
sp = BeautifulSoup(srce, 'lxml')

I can get its parent element like this

down = sp.find(id = "attachments-links")

but could not find its a tag present in its first div. I tried :

down3 = sp.find("a", attrs={"class": "usa-button-small usa-button-gray ng-star-inserted"})
down = sp.select("#attachments-links>div.download-container-header>span>a")

none of them works fine and returns me [].

I can go to the h2 tag which is just above it. like this

down = sp.find(id = "attachments-links").find('div') 

and printing down gives me :

<div class="download-container-header"><h2 id="opp-view-attachments-section-title">Attachments/Links</h2><!-- --></div>

link: https://beta.sam.gov/opp/8f1efc97df214010b46631c74e6a8aa0/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1

your help is much appreciated. enter image description here

Talib Daryabi
  • 733
  • 1
  • 6
  • 28

6 Answers6

1

The page first needs to be clicked on / scrolled in order to extract the correct information.

from selenium import webdriver
from bs4 import BeautifulSoup

URL = "https://beta.sam.gov/opp/8f1efc97df214010b46631c74e6a8aa0/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1"

driver = webdriver.Chrome()
driver.get(URL)
# Wait for the page to fully render
driver.implicitly_wait(5)

# Click on an element of the page
driver.find_element_by_css_selector("h2#opp-view-attachments-section-title").click()

soup = BeautifulSoup(driver.page_source, "lxml")

button = soup.find("a", attrs={"class": "usa-button-small usa-button-gray ng-star-inserted"})["href"]
print(button)

Output:

https://beta.sam.gov/api/prod/opps/v3/opportunities/8f1efc97df214010b46631c74e6a8aa0/resources/download/zip?api_key=null&token=
MendelG
  • 14,885
  • 4
  • 25
  • 52
1

You can construct the download link from the URL (without using selenium or beautifulsoup):

import re


url = 'https://beta.sam.gov/opp/8f1efc97df214010b46631c74e6a8aa0/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1'

opp_id = re.search(r'opp/([^/]+)', url).group(1)
download_url = 'https://beta.sam.gov/api/prod/opps/v3/opportunities/{opp_id}/resources/download/zip?api_key=null&token='.format(opp_id=opp_id)

print(download_url)

Prints:

https://beta.sam.gov/api/prod/opps/v3/opportunities/8f1efc97df214010b46631c74e6a8aa0/resources/download/zip?api_key=null&token=
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • hey Andrej Kesely, If I paste the provided link directly into the browser , or If I put the link as href value for an a tag. either of the way the link wont work – Talib Daryabi Oct 06 '20 at 08:55
  • @TalibDaryabi My script produces `https://beta.sam.gov/api/prod/opps/v3/opportunities/8f1efc97df214010b46631c74e6a8aa0/resources/download/zip?api_key=null&token=` Opening this link in Firefox I can download the ZIP file. – Andrej Kesely Oct 06 '20 at 08:59
  • 1
    @TalibDaryabi You cannot reach https://beta.sam.gov/api/prod/opps/v3/opportunities/8f1efc97df214010b46631c74e6a8aa0/resources/download/zip?api_key=null&token= ? If not, maybe the server blocks you... – Andrej Kesely Oct 06 '20 at 09:07
  • I have asked a question, can you help me with the solution https://stackoverflow.com/questions/64640225/the-urllib-request-return-me-an-empty-data-while-the-same-request-in-postman-re – Talib Daryabi Nov 02 '20 at 05:38
0

Please use following xpath to click button //*[contains(text(),'Download All Attachments/Links')]

(copy that text from webpage) -----Download All Attachments/Links

Justin Lambert
  • 940
  • 1
  • 7
  • 13
0

Use the xpath :

   (“/span[contains(@class=‘download-button]/a[@class=‘usa-button-small usa-button-grey ng-star-inserted’]”)

And then getAttribute(“href”)

YourHelper
  • 703
  • 7
  • 17
0

Just grab the element and print it's ahref attribute.

downloadUrl=WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, " div.download-container-header > span > a"))).get_attribute('href')
print(downloadUrl)

Import

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
Arundeep Chohan
  • 9,779
  • 5
  • 15
  • 32
0

String hrefelement= driver.findElement(By.xpath("//*[contains(text(),'Download All Attachments/Links')]")).getAttribute("a");

then print that

Justin Lambert
  • 940
  • 1
  • 7
  • 13