From given HTML
I need to extract specific urls. For example, <a>
and attribute href
looks like this:
<a href="https://hoster.com/some_description-specific_name-more_description.html">
I need to extract only urls that include "hoster.com" and "specific_name"
I have used BeautifulSoup
on an Raspberry Pi but i only can the basic thing which extracts all ULRs of an HTML
:
from bs4 import BeautifulSoup
with open("page.html") as fp:
soup = BeautifulSoup(fp, 'html.parser')
for link in soup.find_all('a'):
print(link.get('href'))