4

How can I get the content for a group with an asterisk?

For example I'd like to pare a comma separated list, e. g. 1,2,3,4,5.

private static final String LIST_REGEX = "^(\\d+)(,\\d+)*$";
private static final Pattern LIST_PATTERN = Pattern.compile(LIST_REGEX);

public static void main(String[] args) {
    final String list = "1,2,3,4,5";
    final Matcher matcher = LIST_PATTERN.matcher(list);
    System.out.println(matcher.matches());
    for (int i = 0, n = matcher.groupCount(); i < n; i++) {
        System.out.println(i + "\t" + matcher.group(i));
    }
}

And the output is

true
0   1,2,3,4,5
1   1

How can I get every single entry, i. e. 1, 2, 3, ...?

I am searching for a common solution. This is only a demonstrative example.
Please imagine a more complicated regex like ^\\[(\\d+)(,\\d+)*\\]$ to match a list like [1,2,3,4,5]

Unihedron
  • 10,902
  • 13
  • 62
  • 72
Vertex
  • 2,682
  • 3
  • 29
  • 43
  • For your second example the easiest should be to maybe use regex to get what is between [] and after use split. It will be really less efficient to use regex for that. – alkino Sep 15 '14 at 23:39

2 Answers2

6

You can use String.split().

for (String segment : "1,2,3,4,5".split(","))
    System.out.println(segment);

Or you can repeatedly capture with assertion:

Pattern pattern = Pattern.compile("(\\d),?");
for (Matcher m = pattern.matcher("1,2,3,4,5");; m.find())
     m.group(1);

For your second example you added you can do a similar match.

for (String segment : "!!!!![1,2,3,4,5] //"
                          .replaceFirst("^\\D*(\\d(?:,\\d+)*)\\D*$", "$1")
                          .split(","))
    System.out.println(segment);

I made an online code demo. I hope this is what you wanted.


how can I get all the matches (zero, one or more) for a arbitary group with an asterisk (xyz)*? [The group is repeated and I would like to get every repeated capture.]

No, you cannot. Regex Capture Groups and Back-References tells why:

The Returned Value for a Given Group is the Last One Captured

Since a capture group with a quantifier holds on to its number, what value does the engine return when you inspect the group? All engines return the last value captured. For instance, if you match the string A_B_C_D_ with ([A-Z]_)+, when you inspect the match, Group 1 will be D_. With the exception of the .NET engine, all intermediate values are lost. In essence, Group 1 gets overwritten each time its pattern is matched.

Unihedron
  • 10,902
  • 13
  • 62
  • 72
  • Thanks. For this example is your solution the easiest way, but this is more a common question. – Vertex Sep 15 '14 at 23:23
  • @Vertex What's the common question? Is it that you want to match digits between commas from a list, or you want to match all digits? – Unihedron Sep 15 '14 at 23:31
  • the common question is how to deal with groups that have an asterisk: `(xyz)*`? The definition says, that the regex `xyz` can appears zero, one or more times. And I'd like to get all the matches. In the _special_ list example above that means, I want to get all these matches: `1`, `2`, `3`,... for `(,\\d+)*`. I voted +1 because other people may search for this _special_ problem, but I'am not :) – Vertex Sep 16 '14 at 07:04
  • 1
    @Vertex You cannot neutralize the quantifier into capturing. When capturing groups are repeated in a match, only the last match will be remembered. I'm going to update my answer and explain how this is so. – Unihedron Sep 16 '14 at 07:18
2

I assume you may be looking for something like the following, this will handle both of your examples.

private static final String LIST_REGEX = "^\\[?(\\d+(?:,\\d+)*)\\]?$";
private static final Pattern LIST_PATTERN = Pattern.compile(LIST_REGEX);

public static void main(String[] args) {
    final String list = "[1,2,3,4,5]";
    final Matcher matcher = LIST_PATTERN.matcher(list);

    matcher.find(); 
    int i = 0;

    String[] vals = matcher.group(1).split(",");

    System.out.println(matcher.matches());
    System.out.println(i + "\t" + matcher.group(1));

    for (String x : vals) {
       i++;
       System.out.println(i + "\t" + x);
    }
}

Output

true
0   1,2,3,4,5
1   1
2   2
3   3
4   4
5   5
hwnd
  • 69,796
  • 4
  • 95
  • 132
  • I voted +1 because other people may search for this special problem and you solution for it. But I don't accept the answer because I want to know how can I get all the matches (zero, one or more) for a arbitary group with an asterisk `(xyz)*` – Vertex Sep 16 '14 at 07:07