I Have a HTML string,
I was surfing http://www.google.com, where I found my tweet,
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span>http://www.google.com</span>
to this,
I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet,
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span><a href="http://www.google.com">http://www.google.com</a></span>
I try this Demo
my python code is
import re
p = re.compile(ur'<a\b[^>]*>.*?</a>|((ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?)', re.MULTILINE)
test_str = u"I was surfing http://www.google.com, where I found my tweet, check it out <a href=\"http://tinyurl.com/blah\">http://tinyurl.com/blah</a>"
for item in re.finditer(p, test_str):
print item.group(0)
Output:
>>> http://www.google.com,
>>> <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>