Couldn't get the 'href' from anchor tag

Question

Through inspect element on web page, i am able to properly see the link for anchor tag like, but when i try to get it through soup, it gives me the result as . I tried lxml and html5lib but couldn't find any solution.

add your code, and HTML snip, community members won't type code for you to reproduce problem, adding screenshot is bad practice. — Dev, Jan 09 '19 at 08:34

score 0 · Answer 1 · answered Jan 09 '19 at 11:50

I had a similar issue, some chunks of the html page that I was scraping were not loaded correctly. I ended up scraping using PhantomJS via Selenium. Here's an example. And another one.

There's also dryscape , which I've never used, but might do the trick.

score 0 · Answer 2 · answered Jan 09 '19 at 16:28

I was able to get the href by specifying a User-Agent in the headers. Site may be designed to give different response to various browsers. It is better to use a User-Agent similar to the browser which you used to inspect the page.

import requests
from bs4 import BeautifulSoup
url='https://co.jim-hogg.tx.us/index.php/bids/278-solid-waste-resedential-collection-disposal-bids'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get(url, headers=headers)
soup=BeautifulSoup(r.text,'html.parser')
print(soup.find("div",{"itemprop":"articleBody"}).a['href'])

Output

http://www.jimhoggcounty.net/files/BIDS/Notice%20for%20bids%20on%20Solid%20Waste%20Residential%20%26%20CommercialCollection.pdf

Note:
My region was blocked by the site, so i had to use a proxy to get a response. I have removed that additional code.

Couldn't get the 'href' from anchor tag

2 Answers2