I am new with Python and I'm trying to learn web scraping.
I have the following code and would like to know how to get/print the href or the link:
<.h1><.a href="https://www.nytimes.com/tips"> Got a confidential news tip?
I am new with Python and I'm trying to learn web scraping.
I have the following code and would like to know how to get/print the href or the link:
<.h1><.a href="https://www.nytimes.com/tips"> Got a confidential news tip?
You can use BeautifulSoup
to get this work done:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
response = urlopen("http://someurl.com")
page_source = response.read()
soup = BeautifulSoup(page_source, 'html.parser')
x = soup.find_all('h1')
print (x)
then all you have to do is use the re
module and extract data from the output.