How to print only url data from Webscraper

Question

i'm building a webscraper and want it retrieve the url from a title. This is the code i'm currently using :

for item in g_data: 
    print item.contents[1].find_all("a", {"class": "a-link-normal"})[1]

And this prints:

<a class="a-link-normal s-access-detail-page a-text-normal"
 href="http://www.amazon.co.uk/Scotch-BUFF-Brown-Packaging-Parcel/dp/B001OYOI5E"
 title="3M Scotch BUFF Brown Packaging Parcel Tape 50mm x 66m - Pack of
 2"><h2 class="a-size-medium a-color-null s-inline s-access-title
 a-text-normal">3M Scotch BUFF Brown Packaging Parcel Tape 50mm x 66m -
 Pack of 2</h2></a>

Now what I would like is to be able to just get

"http://www.amazon.co.uk/Scotch-BUFF-Brown-Packaging-Parcel/dp/B001OYOI5E"

, however i'm not sure how to target that specific data. Does anyone know how to do this?, I would really appreciate it, thank you.

Although similar to the other post, this is different and is not as complex, I think the solution in the other problem could work, but would require rewriting of code.

possible duplicate of [retrieve links from web page using python and BeautifulSoup](http://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup) — Snicksie, May 01 '15 at 09:57
@snicksie I looked at that, and although it is similar, I don't know how I can use that in my code. — Jamie Mumford, May 01 '15 at 10:24
try this: ```print item.contents[1].find_all("a", {"class": "a-link-normal"})[1]['href']``` (so get the href of the anchor-element, which contains the url you want) — Snicksie, May 01 '15 at 10:26

score 0 · Answer 1 · answered May 01 '15 at 10:29

0

Instead of printing the entire anchor-element, you only need the value of the href-attribute. You can access this attribute as following:

for item in g_data: 
    print item.contents[1].find_all("a", {"class": "a-link-normal"})[1]['href']

answered May 01 '15 at 10:29

Snicksie

1,987
17
27

How to print only url data from Webscraper

1 Answers1