RegEx for replacing all groups with one row

Question

For example, I have this string:

<ul><li><ahref="http://test.com">sometext</a></li></ul>

and I want this output:

<ul><li>[URL href="http://test.com"]sometext[/URL]</li></ul>

So I create this regex, to matches <ahref - first group, "> - second group and </a> - third group, to replace them with [URL for first group, "] for second group and [/URL] for third group:

pattern = r'(<a ?href).+(">).+(<\/a>)'

It matches the groups, but now I don't know how to replace them.

have you tried [```re.sub```](https://stackoverflow.com/questions/45142327/how-to-replace-multiple-matches-groups-with-python-regexes) — Xosrov, Jun 09 '19 at 20:35
@Xosrov I tried with `re.sub` but it replace me whole match, not just the groups — MorganFreeFarm, Jun 09 '19 at 20:41
You can use `re.sub` if you group the things you _not_ want to change and refer to them in replacement as `\1`, `\2` — Michael Butscher, Jun 09 '19 at 20:45
@MorganFreeFarm, if it's for html file - I would recommend a more robust approach — RomanPerekhrest, Jun 09 '19 at 20:48

Emma · Accepted Answer · 2019-06-09T20:54:30.443

Here, we would capture what we wish to replace using 4 capturing groups, with an expression similar to:

(<ul><li>)<a\s+href=\"(.+?)\">(.+?)<\/a>(<\/li><\/ul>)

Demo 1

For missing space, we would simply use:

(<ul><li>)<ahref=\"(.+?)\">(.+?)<\/a>(<\/li><\/ul>)

Demo 2

If we might have both instances, we would add an optional space group using a capturing or non-capturing group:

(<ul><li>)<a(\s+)?href=\"(.+?)\">(.+?)<\/a>(<\/li><\/ul>)

Demo 3

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(<ul><li>)<a\s+href=\"(.+?)\">(.+?)<\/a>(<\/li><\/ul>)"

test_str = "<ul><li><a href=\"http://test.com\">sometext</a></li></ul>
"

subst = "\\1[URL href=\"\\2\"]\\3[/URL]\\4"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

RegEx Circuit

jex.im visualizes regular expressions:

Thank you! But i'm curious, is there a way to said something like "replace group 1 from this match of this string with something" — MorganFreeFarm, Jun 09 '19 at 20:56

score 1 · Answer 2 · answered Jun 09 '19 at 20:48

import re
text = "<ul><li><ahref=\"http://test.com\">sometext</a></li></ul>"
pattern = r'(<a ?href).+(">).+(<\/a>)'
url = re.findall('".*"', text)[0]
value = re.findall('>\w+<', text)[0][1:-1]
new_text = re.sub(pattern, '[URL href=' + url + "]" + value + '[/URL]', text)
print(new_text)