1

I need to find the text inside the element ignoring the children text. So, I have used the following code:

text = """<a aria-expanded="false" aria-owns="faqGen5" href="#">aaa <span class="nobreak">bbb</span> ccc?</a>"""
obj = BeautifulSoup(text)
obj.find(text=True)

Expected output

aaa ccc?

Current output

aaa
Keyur Potdar
  • 7,158
  • 6
  • 25
  • 40
Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55

1 Answers1

1

If you have a look at the .contents of a tag, you'll see that the text you want belongs to a class called NavigableString.

from bs4 import BeautifulSoup, NavigableString

html = """<a aria-expanded="false" aria-owns="faqGen5" href="#">aaa <span class="nobreak">bbb</span> ccc?</a>"""
soup = BeautifulSoup(html, 'lxml')

for content in soup.find('a').contents:
    print(content, type(content))

# aaa  <class 'bs4.element.NavigableString'>
# <span class="nobreak">bbb</span> <class 'bs4.element.Tag'>
#  ccc? <class 'bs4.element.NavigableString'>

Now, you simply need to get the elements belonging to the NavigableString class and join them together.

text = ''.join([x for x in soup.find('a').contents if isinstance(x, NavigableString)])
print(text)
# aaa  ccc?
Keyur Potdar
  • 7,158
  • 6
  • 25
  • 40