How do I use bs4 to parse the text description of an anchor tag, especially when the href link is broken?

Question

I'm practicing using BS4 to parse HTML files. I've encountered a certain issue and I can't seem to find the solution anywhere. How would I parse the inside of an an anchor tag? I've tried specifying the "href" tag but the link has some added characters which breaks the href tag.

For instance, I am trying to parse this link to one of my older questions:

<a href = "https://stackoverflow.com/questions/61925957/using-an-api-to-create-data-in-a-react-table" style=
=3D"color: #FFFFFF;font-size: 15px;"> >

But, instead it has some characters which breaks the tag:

<a href = "https://stackoverflow.com/&amp=3D"questions/61925957"=3D"/using-an-api-to-create-data-in-a-react-table" style=
=3D"color: #FFFFFF;font-size: 15px;" >

How would I get the inside of this tag using bs4 so that I can trim it and get my final link? I want to also ignore the style, color and font-size descriptors.

Please update your question with your attempt in the form of a [mre]. I can't reproduce your issue. — baduker, Mar 07 '23 at 08:05

score 1 · Answer 1 · answered Mar 07 '23 at 08:07

1

I can't reproduce the issue, this works just fine:

from bs4 import BeautifulSoup

html_sample = """<a href = "https://stackoverflow.com/questions/61925957/using-an-api-to-create-data-in-a-react-table" style=
=3D"color: #FFFFFF;font-size: 15px;"> >"""

soup = BeautifulSoup(html_sample, "lxml").select_one("a")["href"]
print(soup)

Output:

https://stackoverflow.com/questions/61925957/using-an-api-to-create-data-in-a-react-table

answered Mar 07 '23 at 08:07

baduker

19,152
9
33
56

I want the second codeblock parsed, not the first one as it is how it is supposed to be. The second code-block I think has an error with the HTML decoding and encoding. – Vishnu Vennelakanti Mar 07 '23 at 08:54
Where did you get the second block from? Please provide a [mre]. – baduker Mar 07 '23 at 09:44

How do I use bs4 to parse the text description of an anchor tag, especially when the href link is broken?

1 Answers1