Replace a url into anchor tag using a Python regex

Question

I Have a HTML string,

I was surfing http://www.google.com, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span>http://www.google.com</span>

to this,

I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span><a href="http://www.google.com">http://www.google.com</a></span>

I try this Demo

my python code is

import re
p = re.compile(ur'<a\b[^>]*>.*?</a>|((ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?)', re.MULTILINE)
test_str = u"I was surfing http://www.google.com, where I found my tweet, check it out <a href=\"http://tinyurl.com/blah\">http://tinyurl.com/blah</a>"

for item in re.finditer(p, test_str):
    print item.group(0)

Output:

>>> http://www.google.com,
>>> <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>

so what are you missing? you found the url, now just check if its not an already and replace, right? — mikus, Oct 27 '15 at 13:34
@mikus i update my question, when i use it in my python code it return anchor tag also. — i'm PosSible, Oct 27 '15 at 13:39
So the desired output is just `>>> http://www.google.com,` ? — Jim K, Oct 27 '15 at 14:01

score 1 · Answer 1 · answered Oct 27 '15 at 16:49

I hope this can help you.

Code:

import re
p = re.compile(ur'''[^<">]((ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?)[^< ,"'>]''', re.MULTILINE)
test_str = u"I was surfing http://www.google.com, where I found my tweet, check it out <a href=\"http://tinyurl.com/blah\">http://tinyurl.com/blah</a>"

for item in re.finditer(p, test_str):
    result = item.group(0)
    result = result.replace(' ', '')
    print result
    end_result = test_str.replace(result, '<a href="' + result + '">' + result + '</a>')

print end_result

Output:

http://www.google.com
I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet, check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>

Its work, but suppose url in span or another tag then it also ignore. i only ignor anchor tag, so help me with this scenario. Thanks!! — i'm PosSible, Oct 28 '15 at 06:46

score 1 · Answer 2 · edited May 23 '17 at 11:58

Ok, I think I finally found what you're looking for. The basic idea is to try to match <a href and a URL. If there is an <a href then don't do anything, but if there is not then add the link. Here is the code:

import re
test_str = """I was surfing http://www.google.com, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span>http://www.google.com</span>
"""
def repl_func(matchObj):
    href_tag, url = matchObj.groups()
    if href_tag:
        # Since it has an href tag, this isn't what we want to change,
        # so return the whole match.
        return matchObj.group(0)
    else:
        return '<a href="%s">%s</a>' % (url, url)

pattern = re.compile(
    r'((?:<a href[^>]+>)|(?:<a href="))?'
    r'((?:https?):(?:(?://)|(?:\\\\))+'
    r"(?:[\w\d:#@%/;$()~_?\+\-=\\\.&](?:#!)?)*)",
    flags=re.IGNORECASE)
result = re.sub(pattern, repl_func, test_str)
print(result)

Output:

I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet,
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span><a href="http://www.google.com">http://www.google.com</a></span>

The main idea is from https://stackoverflow.com/a/3580700/5100564. I also borrowed from https://stackoverflow.com/a/6718696/5100564.

score 0 · Answer 3 · answered Oct 27 '15 at 14:00

0

You could make the regex more complex, but as mikus suggested, it seems easier to do the following:

for item in re.finditer(p, test_str):
    result = item.group(0)
    if not "<a " in result.lower():
        print(result)

answered Oct 27 '15 at 14:00

Jim K

12,824
2
22
51

Replace a url into anchor tag using a Python regex

3 Answers3