I have an HTML snippet that I need to get data from using BeautifuSoup:
<!doctype html>
<html lang="en">
<body>
<div class="sidebar-box">
<h3><i class="fa fa-users"></i> Management Team</h3>
Chairman, Director
</div>
<div class="sidebar-box">
<h3><i class="fa fa-male"></i> Teacher</h3>
John Doe
</div>
<div class="sidebar-box">
<h3><i class="fa fa-mortar-board"></i> Awards </h3>
National Top Quality Educational Development
</div>
<div class="sidebar-box">
<h3><i class="fa fa-building"></i> School Type</h3>
Secondary
</div>
</body>
</html>
I need to get the .text
value of the second div
from the top "John Doe", but not the .text
value inside the h3
tag in that div
.
My challenge is that currently I get both text values as in this code snippet:
# Python 3.7, BeautifulSoup 4.7
# html variable is equal to the above HTML snippet
from bs4 import BeautifulSoup
soup4 = BeautifulSoup(html, "html.parser")
# Get School Head Teacher
school_head_teacher = soup4.find_all('div', {'class':'sidebar-box'})
school_head_teacher = school_head_teacher[1].text.strip()
print(school_head_teacher)
This outputs:
Teacher
John Doe
However, I only need the John Doe value.