4

It's very strange because it is very simple regex for dd/mm format. The result should be: "Group 1: 14; Group 2: 12" but it is "Group 1: 14; Group 2: 1".

The 2nd group only captured the first character, but omit the second one ('2' in the example).

String sDay = "(?:0?[1-9]|[12][0-9]|3[01])";
String sMonth = "(?:0?[1-9]|1[0-2])";
String sDot = "[\\.]";
String sSlash = "[/]";
String sMinus = "[\\-]";
String sSeparators = (sDot + "|" + sSlash + "|" + sMinus);

Pattern reDayMonth =
    Pattern.compile("(" + sDay + ")" + "(?:" + sSeparators + ")" + "(" + sMonth+ ")");

String s = "14/12";
Matcher reMatcher = reDayMonth.matcher(s);
boolean found = reMatcher.find();

System.out.println("Group 1: " + reMatcher.group(1) + "; Group 2: " + reMatcher.group(2));

I cannot understand why. Could you please help me?

ekad
  • 14,436
  • 26
  • 44
  • 46
Trang
  • 41
  • 2

1 Answers1

3

In your month regex, you're allowing a single-digit to match first and-so it does (and then stops). Try moving the required-two-digit month to check first and then the single digit:

(?:0?[1-9]|1[0-2])

should become:

(?:1[0-2]|0?[1-9])

UPDATE (reasoning)
The reason why the same pattern, leading with 0?, in the day pattern works but doesn't in the month pattern is because you specify that there are characters that have to follow the day pattern - therefore, the entire pattern for the day is processed. In the month pattern, however, there are no characters specified to follow; therefore, it stops upon finding a first match which, in the original pattern, was a single digit.

If you were to reverse the input format (i.e. instead of dd/mm you used mm/dd) and simply swapped sDay and sMonth in the compiled regex, you'll actually notice that the month would properly match two numbers and the day would fail instead!

One way to resolve the issue is by matching the two-character rule first and then the optional single-character, like my answer suggests. An alternative method would assume/require that your input date is on a line by itself (i.e. the date starts at the beginning of the line and ends at the end of the line with no other text). If this is true, you can use regex's ^ and $ characters to match the beginning and end of the line, respectively:

Pattern.compile("^(" + sDay + ")" + "(?:" + sSeparators + ")" + "(" + sMonth+ ")$");

Doing this, it will evaluate each pattern completely to find the full match and, in this case, you should always match the correct month/day.

SITE NOTE (suggestion, not answer-specific though)
Per a useful comment/suggestion by @MarkoTopolnik, you don't need to use the non-capturing group around each group (months + days), especially since you immediately wrap them in a capturing group rendering the non-capturing group useless. So, the above pattern could simply become:

1[0-2]|0?[1-9]
newfurniturey
  • 37,556
  • 9
  • 94
  • 102
  • Doesn't this fail in the capture group department? Actually, I think the non-capture group parens are redundant. – Marko Topolnik Oct 25 '12 at 12:54
  • 1
    OP should remove the non-capture group parens, I think. Everything will stay the same. – Marko Topolnik Oct 25 '12 at 12:56
  • "Try moving the required-two-digit month to check first and then the single digit: (?:1[0-2]|0?[1-9])" -> Thank newfurniturey so much. That's the point!!! – Trang Oct 25 '12 at 12:59
  • @newfurniturey: I still have one concern: Why does the day is still captured in 2-digit-day format, but month isn't. – Trang Oct 25 '12 at 13:15
  • @newfurniturey: Sorry, I have updated the wrong version (which does not include the option for 0). Now it's the right one. Could you check again for me? Thanks. – Trang Oct 25 '12 at 13:19
  • @newfurniturey: Sorry for bad explanation. I mean your solution works well for my problem. Thanks. However, I have an additional concern related to that: With the source code in my question, why is the sDay captured both single-digit and 2-digit, but the sMonth is only captured the single-digit? – Trang Oct 25 '12 at 13:32
  • The problem I asked is already absolutely sovled with your answer. My concern is with the origin source code, not with the solved one you answer to me. You can see the same structure between sDay and sMonth, but sDay can be captured in 2 digits, but sMonth only can be captured in 1 digit... Just to understand more about the solution. – Trang Oct 25 '12 at 14:01
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/18587/discussion-between-newfurniturey-and-trang) – newfurniturey Oct 25 '12 at 15:28