Say you have this string:
text = """<p>Bla bla bla.</p><p>Blo blo blo<a
href="http://www.example.com">bli bli</a>.</p><p>blu blu<br>
<span style="font-size: x-small;"><br>
content to remove</span></p>"""
My goal is to remove everything inside <span style="font-size: x-small;"><br>content to remove</span>
, along with the opening and closing tags.
So I can only delete span tags (and its content) if attribute style is "font-size: x-small;"
.
My code doesn't work. Here it is:
import re
pattern = re.compile(r"\<span style='font-size: x-small;'\>.*?\</span\>")
new_text = pattern.sub(lambda match: match.group(0).replace(match.group(0),'') ,text)
I'd rather go with Python itself, cause I no nothing about regex (as you can see...). But if regex is the way to go, I will take it.