I want to extract content (Content here) from following html with BeautifulSoap and XPath respectively. How can it be done.
<div class="paragraph">
<h1>Title here</h1>
Content here
</div>
Output:
Content here
I want to extract content (Content here) from following html with BeautifulSoap and XPath respectively. How can it be done.
<div class="paragraph">
<h1>Title here</h1>
Content here
</div>
Output:
Content here
There are many ways you can achieve that.Here are few of them.
By using contents
OR
By using next_element
OR
By using next_sibling
OR
By using stripped_strings
from bs4 import BeautifulSoup
html='''<div class="paragraph">
<h1>Title here</h1>
Content here
</div>'''
soup=BeautifulSoup(html,"html.parser")
print(soup.find('div',class_='paragraph').contents[2].strip())
print(soup.find('div',class_='paragraph').find('h1').next_element.next_element.strip())
print(soup.find('div',class_='paragraph').find('h1').next_sibling.strip())
print(list(soup.find('div',class_='paragraph').stripped_strings)[1])
You can use css selector as well.
html='''<div class="paragraph">
<h1>Title here</h1>
Content here
</div>'''
soup=BeautifulSoup(html,"html.parser")
print(soup.select_one('.paragraph').contents[2].strip())
print(soup.select_one('.paragraph >h1').next_element.next_element.strip())
print(soup.select_one('.paragraph >h1').next_sibling.strip())
print(list(soup.select_one('.paragraph').stripped_strings)[1])