BeautifulSoup4 unable to find "a" tag by searching for text

Question

Example HTML

<a class="accordion-item__link" href="/identity-checking/individual"><!-- react-text: 178 -->Australia<!-- /react-text --></a>

When I run

soup.find("a", text="Australia")

it returns nothing.

If I run soup.find("a", href="/identity-checking/individual") it finds the tag.
soup.find("a", href="/identity-checking/individual").text also returns 'Australia'

is it something to do with the comments?

score 1 · Answer 1 · edited May 02 '18 at 12:15

I'm trying to find a method that sticks to the find method as it is the most convenient & adaptable. The problem here is that the HTML comments mess up the engine. Manually remove comments would be helpful.

from bs4 import BeautifulSoup, Comment

bs = BeautifulSoup(
    """
    <a class="accordion-item__link" href="/identity-checking/individual"><!-- react-text: 178 -->Australia<!-- /react-text --></a>
    """,
    "lxml"
)
# find all HTML comments and remove
comments = bs.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]

r = bs.find('a', text='Australia')
print(r)
#  <a class="accordion-item__link" href="/identity-checking/individual">Australia</a>

The method to remove comments came from here How can I strip comment tags from HTML using BeautifulSoup?

If the comments are meant to be preserved, you may work on a copy of soup.

score 0 · Answer 2 · answered May 02 '18 at 11:56

0

Try to extract the text after finding the tag, that is:

result = ""
for tag in soup.find_all('a'):
    if tag.text == "Australia":
        result = tag

answered May 02 '18 at 11:56

BcK

2,548
1
13
27

score 0 · Accepted Answer · answered May 02 '18 at 11:59

0

For some reason, detecting the tag text gets flipped when there is an xml comment.

You can use this as a workaround:

[ele for ele in soup('a') if ele.text == 'Australia']

answered May 02 '18 at 11:59

iDrwish

3,085
1
15
24

BeautifulSoup4 unable to find "a" tag by searching for text

3 Answers3