2

I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.

Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:

String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
  m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);

I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).

The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.

tckmn
  • 57,719
  • 27
  • 114
  • 156
Saad
  • 93
  • 2
  • 9
  • Umm... Java regex does identify `\s`... – tckmn Jun 25 '13 at 19:06
  • @Doorknob I was referring to the post [link](http://stackoverflow.com/questions/5601754/regex-allowing-a-space-character-in-java) and [link](http://stackoverflow.com/questions/4731055/whitespace-matching-regex-java) – Saad Jun 25 '13 at 19:11
  • those posts are completely unrelated – tckmn Jun 25 '13 at 19:13

2 Answers2

6

Try something like this replace statement:

yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");

Explanation of the regex:

(?i)  make it case insensitive
^     anchor to start of string
(     start a group (this is the "re:")
\\s*  any amount of optional whitespace
re    "re"
\\s*  optional whitespace
:     ":"
\\s*  optional whitespace
)     end the group (the "re:" string)
+     one or more times
tckmn
  • 57,719
  • 27
  • 114
  • 156
  • 1
    +1. `replace()` doesn't take a regex, you need `replaceAll()`. Other than that, exactly my solution, you were 30 sec. faster. – jlordo Jun 25 '13 at 19:11
  • @jlordo Thanks, forgot about that `:)` – tckmn Jun 25 '13 at 19:11
  • Thanks for your quick response and detailed explanations. Will try that and let you know. – Saad Jun 25 '13 at 19:17
  • Thank you so much @Doorknob. This is exactly what I needed. And thanks to the explanation, I got to know better how this thing works. Would hopefully be able to solve similar problems in future. Can't thank you enough. :) – Saad Jun 25 '13 at 19:31
3

in your regex:

String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"

here is what it does:

Regular expression image

see it live here

it matches strings like:

  • \p{Z}Reee\p{Z: or
  • R\p{Z}}}

which make no sense for what you try to do:

you'd better use a regex like the following:

yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");

or to make @Doorknob happy, here's another way to achieve this, using a Matcher:

Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
    yourString = m.replaceAll("");

(which is as the doc says the exact same thing as yourString.replaceAll())

Regular expression image

Look it up here

(I had the same regex as @Doorknob, but thanks to @jlordo for the replaceAll and @Doorknob for thinking about the (?i) case insensitivity part ;-) )

Community
  • 1
  • 1
zmo
  • 24,463
  • 4
  • 54
  • 90
  • -1 because 1. most of your post is useless (except the regex), and 2. you just copy/pasted my regex – tckmn Jun 25 '13 at 19:15
  • well first, I *did* not copy your regex, there's not many regex solutions to such question (I could have written `\s*(re|Re|RE):\s*` just for making it different, but it's better to use a case insensitive switch). 2/ I don't think showing pictures of NFA is useless to people who write incorrect regex, as 90% of the time they make it wrong because they don't know that a regex is just a syntax to create NFA-based parsers! – zmo Jun 25 '13 at 19:21
  • @Doorknob and if you still believe I need to copy a regex to make an answer, look at my other regex answers to prove me right. – zmo Jun 25 '13 at 19:22
  • ...then why is your regex the *exact same* as mine, and you posted your answer 4 minutes after mine? Also the NFA would have been better in a comment; it does not help solve his problem. – tckmn Jun 25 '13 at 19:22
  • well, I don't see how I can make it different! The `^` is mandatory as he wants only the first occurences. The grouping is also mandatory to make it match one or more occurences. The 0 or more spaces are also needed too as he shows there can be zero or more of such spaces... And finally, the NFA are useful to show the OP he's being wrong, and *why* he is being wrong. AFAICT, answering on SO is not only about telling the correct answer, but also explain to the OP why he is wrong! And showing counter-examples is a good thing. – zmo Jun 25 '13 at 19:27
  • Then you just shouldn't have posted an answer that is the exact same as an existing answer. – tckmn Jun 25 '13 at 19:28
  • In Java, `\p{Z}` (`"\\p{Z}"` in string-literal form) does not match the literal sequence `\p{Z}`, it matches any character that's been assigned the Unicode general category `Separator`; and `\p{Z}*` matches zero or more such characters. Unfortunately, the ASCII space character (`U+0020`) is not assigned to that category, so you have to use `\s` or a literal space character to match it. – Alan Moore Jun 25 '13 at 21:22