How to extract the text inside a tag with BeautifulSoup in Python?

Question

Supposing I have an html string like this:

<html>
    <div id="d1">
        Text 1
    </div>
    <div id="d2">
        Text 2
        <a href="http://my.url/">a url</a>
        Text 2 continue
    </div>
    <div id="d3">
        Text 3
    </div>
</html>

I want to extract the content of d2 that is NOT wrapped by other tags, skipping a url. In other words I want to get such result:

Text 2
Text 2 continue

Is there a way to do it with BeautifulSoup?

I tried this, but it is not correct:

soup = BeautifulSoup(html_doc, 'html.parser')
s = soup.find(id='d2').text
print(s)

Does this answer your question? [Using BeautifulSoup to extract text without tags](https://stackoverflow.com/questions/23380171/using-beautifulsoup-to-extract-text-without-tags) — AMC, Feb 12 '20 at 02:28

score 11 · Accepted Answer · answered Jul 01 '17 at 09:31

Try with .find_all(text=True, recursive=False):

from bs4 import BeautifulSoup
div_test="""
<html>
    <div id="d1">
        Text 1
    </div>
    <div id="d2">
        Text 2
        <a href="http://my.url/">a url</a>
        Text 2 continue
    </div>
    <div id="d3">
        Text 3
    </div>
</html>
"""
soup = BeautifulSoup(div_test, 'lxml')
s = soup.find(id='d2').find_all(text=True, recursive=False)
print(s)
print([e.strip() for e in s]) #remove space

it will return a list with only text:

[u'\n        Text 2\n        ', u'\n        Text 2 continue\n    ']
[u'Text 2', u'Text 2 continue']

t.m.adam · Answer 2 · 2017-07-01T08:34:51.410

2

You can get only the NavigableString objects with a simple list comprehension.

tag = soup.find(id='d2')
s = ''.join(e for e in tag if type(e) is bs4.element.NavigableString)

Alternatively you can use the decompose method to delete all the child nodes, then get all the remaining items with text .

tag = soup.find(id='d2')
for e in tag.find_all() : 
    e.decompose()
s = tag.text

edited Jul 01 '17 at 08:34

answered Jul 01 '17 at 08:00

t.m.adam

15,106
3
32
52

How to extract the text inside a tag with BeautifulSoup in Python?

2 Answers2

Linked