6

In Python in the re module there is the following function:

re.sub(pattern, repl, string, count=0, flags=0) – Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged.

I've found it can work like this:

print re.sub('[a-z]*\d+','lion','zebra432') # prints 'lion'

I was wondering, is there an easy way to use regular expressions in the replacement string, so that the replacement string contains part of the original regular expression/original string? Specifically, can I do something like this (which doesn't work)?

print re.sub('[a-z]*\d+', 'lion\d+', 'zebra432')

I want that to print 'lion432'. Obviously, it does not. Rather, it prints 'lion\d+'. Is there an easy way to use parts of the matching regular expression in the replacement string?

By the way, this is NOT a special case. Please do NOT assume that the number will always come at the end, the words will always come in the beginning, etc. I want to know a solution to all regexes in general.

Thanks

user3745189
  • 521
  • 1
  • 7
  • 17

1 Answers1

12

Place \d+ in a capture group (...) and then use \1 to refer to it:

>>> import re
>>> re.sub('[a-z]*(\d+)', r'lion\1', 'zebra432')
'lion432'
>>>
>>> # You can also refer to more than one capture group
>>> re.sub('([a-z]*)(\d+)', r'\1lion\2', 'zebra432')
'zebralion432'
>>>

From the docs:

Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern.

Note that you will also need to use a raw-string so that \1 is not treated as an escape sequence.