6

I have found this interesting issue with re.sub:

import re

s = "This: is: a: string:"
print re.sub(r'\:', r'_', s, re.IGNORECASE) 

>>>> This_ is_ a: string:

Notice how only the first two instances were replaced. It seems that adding the [implicit] argument name for flags fixes the issue.

import re

s = "This: is: a: string:"
print re.sub(r'\:', r'_', s, flags=re.IGNORECASE) 

>>>> This_ is_ a_ string_

I was wondering if anyone could explain it or it is in fact a bug.

I've encountered this issue before with the missing argument name string but never for flags and with string it usually blows up.

dwkd
  • 2,716
  • 1
  • 17
  • 17

1 Answers1

8

The fourth argument to re.sub is not flags but count:

>>> import re
>>> help(re.sub)
Help on function sub in module re:

sub(pattern, repl, string, count=0, flags=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the match object and must return
    a replacement string to be used.

>>>

This means that you need to explicitly do flags=re.IGNORECASE or otherwise re.IGNORECASE will be treated as an argument to count.

Additionally, the re.IGNORECASE flag is equal to 2:

>>> re.IGNORECASE
2
>>>

So, by doing count=re.IGNORECASE in your first example, you told re.sub to only replace 2 occurrences of : in the string, which it did.