0

Through inspect element on web page, i am able to properly see the link for anchor tag like, Image for that but when i try to get it through soup, it gives me the result as Image of output (soup). I tried lxml and html5lib but couldn't find any solution.

2 Answers2

0

I had a similar issue, some chunks of the html page that I was scraping were not loaded correctly. I ended up scraping using PhantomJS via Selenium. Here's an example. And another one.

There's also dryscape , which I've never used, but might do the trick.

hkr3
  • 38
  • 6
0

I was able to get the href by specifying a User-Agent in the headers. Site may be designed to give different response to various browsers. It is better to use a User-Agent similar to the browser which you used to inspect the page.

import requests
from bs4 import BeautifulSoup
url='https://co.jim-hogg.tx.us/index.php/bids/278-solid-waste-resedential-collection-disposal-bids'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get(url, headers=headers)
soup=BeautifulSoup(r.text,'html.parser')
print(soup.find("div",{"itemprop":"articleBody"}).a['href'])

Output

http://www.jimhoggcounty.net/files/BIDS/Notice%20for%20bids%20on%20Solid%20Waste%20Residential%20%26%20CommercialCollection.pdf

Note:
My region was blocked by the site, so i had to use a proxy to get a response. I have removed that additional code.

Bitto
  • 7,937
  • 1
  • 16
  • 38