6

I have been working on requirement and I need to create a regex on following string:

startDate:[2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]

There can be many variations of this string as follows:

startDate:[*;2016-10-12T12:23:23Z]
startDate:[2016-10-12T12:23:23Z;*]
startDate:[*;*]

startDate in above expression is a key name which can be anything like endDate, updateDate etc. which means we cant hardcode that in a expression. The key name can be accepted as any word though [a-zA-Z_0-9]*

I am using the following compiled pattern

Pattern.compile("([[a-zA-Z_0-9]*):(\\[[[\\*]|[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[Z]];[[\\*]|[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[Z]]\\]])");

The pattern matches but the groups created are not what I expect. I want the group surrounded by parenthesis below:

(startDate):([*:2016-10-12T12:23:23Z])

group1 = "startDate"
group2 = "[*;2016-10-12T12:23:23Z]"

Could you please help me with correct expression in Java and groups?

Vishal
  • 666
  • 1
  • 8
  • 30
  • 2
    `but the groups created are not what I expect.` ... what are the current groups? – Tim Biegeleisen Oct 07 '16 at 05:40
  • dateMatcher.group(0) = "s" dateMatcher.group(1) = "s" @TimBiegeleisen – Vishal Oct 07 '16 at 05:43
  • `startDate:[*:*]` can you narrow this down a bit? There is no point even checking for timestamps on either side of the colon if you will accept anything there. – Tim Biegeleisen Oct 07 '16 at 05:54
  • @TimBiegeleisen I have to extract the time stamps around the colon to do some date range comparisons later. How do you suggest narrowing it down? – Vishal Oct 07 '16 at 05:59
  • If a timestamp doesn't appear, what else could appear there? You need at least semi-fixed structure to write a robust regex here. – Tim Biegeleisen Oct 07 '16 at 06:01
  • @TimBiegeleisen It seems as if the `*` is the character that appears in the input string in case of having no timestamp. The `[\\*]` part in OP's regex shows that, too. – Seelenvirtuose Oct 07 '16 at 06:06
  • @Seelenvirtuose Thanks for pointing this out :-) – Tim Biegeleisen Oct 07 '16 at 06:06
  • @TimBiegeleisen The fixed structure of timestamp is "YYYY-MM-ddTHH:mm:ssZ". Lets call this "YYYY-MM-ddTHH:mm:ssZ" as 'timestamp'. The expression in square brackets can be [timestamp;*], [*;timestamp], [timestamp:timestamp] or [*;*] – Vishal Oct 07 '16 at 06:07
  • Character classes are defined with `[....]`, not `[[...]]`. Your whole pattern matches one single char – Wiktor Stribiżew Oct 07 '16 at 06:09

2 Answers2

4

You are using [ rather than ( to wrap options (i.e. using |).

For example, the following code works for me:

Pattern pattern = Pattern.compile("(\\w+):(\\[(\\*|\\d{4}):\\*\\])");
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
    for (int i = 0; i < matcher.groupCount() + 1; i++) {
        System.out.println(i + ":" + matcher.group(i));
    }
} else {
    System.out.println("no match");
}

To simplify things I just use the year but I'm sure it'll work with the full timestamp string.

This expression captures more than you need in groups but you can make them 'non-capturing' using the (?: ) construct.

Notice in this that I simplified some of your regexp using the predefined character classes. See http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html for more details.

sprinter
  • 27,148
  • 6
  • 47
  • 78
0

Here is a solution which uses your original regex, modified so that it actually returns the groups you want:

String content = "startDate:[2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]";
Pattern pattern = Pattern.compile("([a-zA-Z_0-9]*):(\\[(?:\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z|\\*):(?:\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z|\\*)\\])");
Matcher matcher = pattern.matcher(content);
// remember to call find() at least once before trying to access groups
matcher.find();

System.out.println("group1 = " + matcher.group(1));
System.out.println("group2 = " + matcher.group(2));

Output:

group1 = startDate
group2 = [2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]

This code has been tested on IntelliJ and appears to be working correctly.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360