2

Example HTML

<a class="accordion-item__link" href="/identity-checking/individual"><!-- react-text: 178 -->Australia<!-- /react-text --></a>

When I run

soup.find("a", text="Australia")

it returns nothing.

If I run soup.find("a", href="/identity-checking/individual") it finds the tag.
soup.find("a", href="/identity-checking/individual").text also returns 'Australia'

is it something to do with the comments?

3 Answers3

1

I'm trying to find a method that sticks to the find method as it is the most convenient & adaptable. The problem here is that the HTML comments mess up the engine. Manually remove comments would be helpful.

from bs4 import BeautifulSoup, Comment

bs = BeautifulSoup(
    """
    <a class="accordion-item__link" href="/identity-checking/individual"><!-- react-text: 178 -->Australia<!-- /react-text --></a>
    """,
    "lxml"
)
# find all HTML comments and remove
comments = bs.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]

r = bs.find('a', text='Australia')
print(r)
#  <a class="accordion-item__link" href="/identity-checking/individual">Australia</a>

The method to remove comments came from here How can I strip comment tags from HTML using BeautifulSoup?

If the comments are meant to be preserved, you may work on a copy of soup.

Keyur Potdar
  • 7,158
  • 6
  • 25
  • 40
Blownhither Ma
  • 1,461
  • 8
  • 18
0

Try to extract the text after finding the tag, that is:

result = ""
for tag in soup.find_all('a'):
    if tag.text == "Australia":
        result = tag
BcK
  • 2,548
  • 1
  • 13
  • 27
0

For some reason, detecting the tag text gets flipped when there is an xml comment.

You can use this as a workaround:

[ele for ele in soup('a') if ele.text == 'Australia']
iDrwish
  • 3,085
  • 1
  • 15
  • 24