How to extract href links between
in Python?

Question

I am new with Python and I'm trying to learn web scraping.

I have the following code and would like to know how to get/print the href or the link:

<.h1><.a href="https://www.nytimes.com/tips"> Got a confidential news tip?

similar to http://stackoverflow.com/questions/42173719/how-to-use-regular-expression-to-retrieve-data-in-python/42173798#42173798 — GoingMyWay, Feb 25 '17 at 09:23
another one similar https://stackoverflow.com/questions/3075550/how-can-i-get-href-links-from-html-using-python — Tudor, Feb 25 '17 at 09:24

score 1 · Answer 1 · edited Jul 05 '17 at 16:58

1

You can use BeautifulSoup to get this work done:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

response = urlopen("http://someurl.com")
page_source = response.read()
soup = BeautifulSoup(page_source, 'html.parser')
x = soup.find_all('h1')
print (x)

then all you have to do is use the re module and extract data from the output.

edited Jul 05 '17 at 16:58

cookiedough

3,552
2
26
51

answered Feb 25 '17 at 09:27

likhith lanka

11
2

How to extract href links between in Python?

1 Answers1

How to extract href links between
in Python?