1

In Postgres 9.5.1 the following command works:

select regexp_replace('JamesBond007','\d+','');

Output:

JamesBond

However the asterisc does not seem to work:

select regexp_replace('JamesBond007','\d*','');

it produces:

JamesBond007

Even more weird things happen when I put something in as replacement string:

select regexp_replace('JamesBond007','\d+','008');

results in:

JamesBond008

while

select regexp_replace('JamesBond007','\d*','008');

gives me back:

008JamesBond007

The Postgres documentation says * = a sequence of 0 or more matches of the atom. So what is happening here? (N.B. in Oracle all the above works as expected)

absurd
  • 1,035
  • 3
  • 14
  • 35

1 Answers1

5

The thing is that \d* can match an empty string and you are not passing the flag g.

See regexp_replace:

The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag i specifies case-insensitive matching, while flag g specifies replacement of each matching substring rather than only the first one.

The \d* matches the empty location at the beginning of the JamesBond007 string, and since g is not passed, that empty string is replaced with 008 when you use select regexp_replace('JamesBond007','\d*','008'); and the result is expected - 008JamesBond007.

With select regexp_replace('JamesBond007','\d*','');, again, \d* matches the empty location at the beginning of the string, and replaces it with an empty string (no visible changes).

Note that Oracle's REGEXP_REPLACE replaces all occurrences by default:

By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string.

In general, you should be cautious when using patterns matching empty strings inside regex-based replace functions/methods. Do it only when you understand what you are doing. If you want to replace digit(s) you usually want to find at least 1 digit. Else, why remove something that is not present in the string in the first place?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks for the quick answer, Wiktor. I see how it works now. However, I still have problems to understand why Postgres needs to use these flags at all. You could specify case insensitivity within the regex itself. Also, the g flag feature could be expressed by greediness vs non-greediness of the quantifier. And why \d finds an empty location at the beginning of a string, if it should match only digits? – absurd Apr 08 '16 at 11:43
  • @absurd: Why the default behavior is different between Oracle and Postgres, I have no idea, no one asked me when implementing regexp_replace :). As for the `\d*` matching empty location at the beginning, it is totally valid since `*` means *match **zero** or more occurrences*. There are 0 digits at the beginning as the string starts with `J`. That is why it is matched. – Wiktor Stribiżew Apr 08 '16 at 11:46
  • More, flags you need to pass to the regex replace (and other regex methods) differ across languages. In .NET, `Regex.Replace` will replace all occurrences by default, in Ruby/R you need a specific function for each type of replacement (`sub` - once, `gsub` - many), in JS, you need to pass `g` flag with the regex. I think Postgres was decided to be closer to JS in some way. – Wiktor Stribiżew Apr 08 '16 at 11:48
  • Ah, now I see. Thanks again. – absurd Apr 08 '16 at 11:49