How to extract email text in tag nested in multiple divs using BS4

Question

I am trying to use bs4 to extract this email, I've tried multiple methods and the output is still either none or blank.

<div> class = name 1
<div> class = name 2
    <div> class = name 3
        <div>
        <p> blah blah </p>
        <p>
            <a href = "mailto:email@email.com">
            email@email.com </a>
        </p>
    </div>
</div>

this was my first attempt but I still receive nothing

from bs4 import BeautifulSoup
from lxml import html
import requests
import time

html = requests.get('https://soundcloud.com/camcontrast')
soup = BeautifulSoup(html.text, 'lxml') 

for a in soup.select('.infoStats__description p a'):
    print(a['href'], a.get_text(strip=True))

Which one of the multiple methods were you most confident about? Share your code people will be able to help you. — Grismar, Jun 11 '20 at 04:49

score 0 · Answer 1 · answered Jun 11 '20 at 04:53

BeautifulSoup has a find_all() function that you could use for this purpose. Have a look at the find_all() documentation

And also the find_all() shortcut

from bs4 import BeautifulSoup

text = """
<div> class = name 1
<div> class = name 2
    <div> class = name 3
        <div>
        <p> blah blah </p>
        <p>
            <a href = "mailto:email@email.com">email@email.com </a>
        </p>
    </div>
</div>

"""

data = BeautifulSoup(text, 'html.parser')
for e in data.find_all('a'):
    print(e.string)

As an aside, you might want to check if the class text should be inside the <div> tags.

Humayun Ahmad Rajib · Answer 2 · 2020-06-11T05:14:47.400

You can use CSS selector "select" to extract email address from <a> tag: you can try it:

from bs4 import BeautifulSoup

html_doc="""
<div> class = name 1
<div> class = name 2
    <div> class = name 3
        <div>
        <p> blah blah </p>
        <p>
            <a href = "mailto:email@email.com">email@email.com </a>
        </p>
    </div>
</div>

"""

data = BeautifulSoup(html_doc,'lxml')
data = data.select('a')
for i in data:
    print(i.text)

Output will be:

email@email.com

score 0 · Answer 3 · answered Jun 11 '20 at 05:13

0

Try getting all the a tags using .find_all('a') then filter out those having mailto

from bs4 import BeautifulSoup

text = """<div> class = name 1
<div> class = name 2
    <div> class = name 3
        <div>
        <p> blah blah </p>
        <p>
            <a href = "mailto:email@email.com">
            email@email.com </a>
        </p>
    </div>
</div>"""

soup = BeautifulSoup(text, 'html.parser')
links = soup.find_all('a')

for link in links:
    if(link.get('href').find('mailto:') > -1):
        print(link.string.strip())

Output:

email@email.com

answered Jun 11 '20 at 05:13

Pygirl

12,969
5
30
43

I want to use this but without have to call the actual text. I want it to be able to pull it directly from the website. – Cameron Long Jun 11 '20 at 14:28
You have not given us the complete question then. This is the answer for your current post. Update the question then or better to ask a separate question. – Pygirl Jun 15 '20 at 11:22

How to extract email text in tag nested in multiple divs using BS4

3 Answers3