6

I want to transforme all "*" into ".*" excepte "\*"

String regex01 = "\\*toto".replaceAll("[^\\\\]\\*", ".*");
assertTrue("*toto".matches(regex01));// True

String regex02 = "toto*".replaceAll("[^\\\\]\\*", ".*");
assertTrue("tototo".matches(regex02));// True

String regex03 = "*toto".replaceAll("[^\\\\]\\*", ".*");
assertTrue("tototo".matches(regex03));// Error

If the "*" is the first character a error occure : java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0

What is the correct regex ?

brouille
  • 295
  • 4
  • 14

3 Answers3

3

This is currently the only solution capable of dealing with multiple escaped \ in a row:

String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*");

How it works

Let's print the string regex to have a look at the actual string being parsed by the regex engine:

\G((?:[^\\*]|\\[\\*])*)[*]

((?:[^\\*]|\\[\\*])*) matches a sequence of characters not \ or *, or escape sequence \\ or \*. We match all the characters that we don't want to touch, and put it in a capturing group so that we can put it back.

The above sequence is followed by an unescaped asterisk, as described by [*].

In order to make sure that we don't "jump" when the regex can't match an unescaped *, \G is used to make sure the next match can only start at the beginning of the string, or from where the last match ends.

Why such a long solution? It is necessary, since the look-behind construct to check whether the number of consecutive \ preceding a * is odd or even is not officially supported by Java regex. Therefore, we need to consume the string from left to right, taking into account escape sequences, until we encounter an unescaped * and replace it with .*.

Test program

String inputs[] = {
    "toto*",
    "\\*toto",
    "\\\\*toto",
    "*toto",
    "\\\\\\\\*toto",
    "\\\\*\\\\\\*\\*\\\\\\\\*"};

for (String input: inputs) {
    String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*");
    System.out.println(input);
    System.out.println(Pattern.compile(regex));
    System.out.println();
}

Sample output

toto*
toto.*

\*toto
\*toto

\\*toto
\\.*toto

*toto
.*toto

\\\\*toto
\\\\.*toto

\\*\\\*\*\\\\*
\\.*\\\*\*\\\\.*
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
2

You need to use negative lookbehind here:

String regex01 = input.replaceFirst("(?<!\\\\)\\*", ".*");

(?<!\\\\) is a negative lookbehind that means match * if it is not preceded by a backslash.

Examples:

regex01 = "\\*toto".replaceAll("(?<!\\\\)\\*", ".*");
//=> \*toto

regex01 = "*toto".replaceAll("(?<!\\\\)\\*", ".*");
//=> .*toto
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    @gregouille is it possible that `*` will be placed after escaped ``\``? Something like `"\\\\*"`? If yes what is expected result? – Pshemo May 06 '15 at 12:28
  • Yes it is possible, but it is the second part of my problem, thank you – brouille May 06 '15 at 12:39
  • @gregouille: The case Pshemo brought up is taken care of in my answer. – nhahtdh May 06 '15 at 13:35
  • @gregouille: This is purely based on comments here since it wasn't part of your problem. If there is a possibility of escaping the backslash also using `"\\\\*` like what Pshemo mentioned then you can use `input.replaceAll("(?<!(?<!\\\\)\\\\)\\*", ".*");` – anubhava May 06 '15 at 14:17
  • @anubhava: That is a dirty solution, which assumes a maximum number of ``\`` there can be in the input. Whenever there is escape syntax, doing "peek hole" look-behind like this is only an incomplete solution at best. – nhahtdh May 07 '15 at 02:34
  • First of all that was only a comment. I don't want to give a solution for a problem which is not part of the question. Had it been in question my answer would have been different. – anubhava May 07 '15 at 03:37
0

You have to cater for the case of a string starting with * in your regex:

(^|[^\\\\])\\*

The single caret represents the 'beginning of the string' ( 'start anchor' ).

Edit

Apart from the correction above, the replacement string in the replaceAll call must be $1.* instead of .* lest a matched character before an unescaped * be lost.

collapsar
  • 17,010
  • 4
  • 35
  • 61