0

I'm using robobrowser to parse some html content. I has a BeautifulSoup inside. How can I find a comment with specified string inside

<html>
<body>
<div>
<!-- some commented code here!!!<div><ul><li><div id='ANY_ID'>TEXT_1</div></li>
<li><div>other text</div></li></ul></div>-->
</div>
</body>
</html>

In fact I need to get TEXT_1 if I know ANY_ID Thanks

GhostKU
  • 1,898
  • 6
  • 23
  • 32

1 Answers1

0

Use the text argument and check the type to be Comment. Then, load the contents with BeautifulSoup again and find the desired element by id:

from bs4 import BeautifulSoup
from bs4 import Comment

data = """
<html>
<body>
<div>
<!-- some commented code here!!!<div><ul><li><div id='ANY_ID'>TEXT_1</div></li>
<li><div>other text</div></li></ul></div>-->
</div>
</body>
</html>
"""

soup = BeautifulSoup(data, "html.parser")
comment = soup.find(text=lambda text: isinstance(text, Comment) and "ANY_ID" in text)

soup_comment = BeautifulSoup(comment, "html.parser")
text = soup_comment.find("div", id="ANY_ID").get_text()
print(text)

Prints TEXT_1.

Graham
  • 7,431
  • 18
  • 59
  • 84
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195