0

I'm writing a very simple sample code about regular expression but failed to work with group.

The regular expression is: rowspan=([\\d]+)

The input string is: <td rowspan=66>x.x.x</td>

I'm testing it on online regex engine and obvious the group 66 can be captured, see snapshot below:

enter image description here

Based on the javadoc,

Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().

So I think there should be two groups and group 0 should be rowspan=66, the group 1 should be 66. However, all I can get from below code is the former.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

    public static void main(String args[]){
        String input = "<td rowspan=66>x.x.x</td> ";
        String regex = "rowspan=([\\d]+)";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(input);
        if(matcher.find()){
            for(int i = 0; i < matcher.groupCount(); i++){
                System.out.println(matcher.group(i));
            }
        }
    }

}

The output is:

rowspan=66

Thanks for your help in advance.

Eugene
  • 10,627
  • 5
  • 49
  • 67
  • 1
    I don't think this is an exact duplicate of the mentioned question. This patter _does_ seem to contain groups. – Ward Jan 05 '18 at 07:28
  • [JavaDoc](https://docs.oracle.com/javase/9/docs/api/java/util/regex/Matcher.html#groupCount--): "Returns the number of capturing groups in this matcher's pattern. Group zero denotes the entire pattern by convention. **It is not included in this count**. " – user85421 Jan 05 '18 at 07:30
  • and is duplicate of https://stackoverflow.com/q/12989917/85421 , https://stackoverflow.com/q/5716703/85421 , ... (stopped searching since someone thinks this is not duplicae) – user85421 Jan 05 '18 at 07:38

3 Answers3

1

I think the problem with your code has to do with understanding what the Matcher#groupCount method does. From the Javadoc:

Returns the number of capturing groups in this matcher's pattern. Group zero denotes the entire pattern by convention. It is not included in this count.

In other words, your for loop would only iterate once, assuming you have a single capture group. But you were printing the first group, which is the entire pattern:

for (int i=0; i < matcher.groupCount(); i++) {
    System.out.println(matcher.group(i));
}

Instead, just iterate while you have a match, and then access the groups you need. I don't see much of a problem with hard coding the capture groups, because if a match occurred, then by definition the capture groups inside that match should be present as well.

String input = "<td rowspan=66>x.x.x</td> ";
String regex = "rowspan=(\\d+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
    System.out.println(matcher.group(0));
    System.out.println(matcher.group(1));
}

Demo

Note: Your pattern also looks a bit strange. If you want to match a digit via \\d, then you don't also have to put that into a character class. Hence I used the pattern rowspan=(\\d+) in my code.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

I've always been a fan of named groups for regular expressions and Java supports this via the special group construct (?<name>). This makes retrieving the correct group easier and you won't mess things up if you later add another group earlier in the expression. It halso has the side-effect that it removes any confusion regarding matcher.groupCount().

Change your regular expression to rowspan=(?<rowspan>[\\d]+)

And your code to:

public class Test {

    public static void main(String args[]){
        String input = "<td rowspan=66>x.x.x</td> ";
        String regex = "rowspan=(?<rowspan>[\\d]+)";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(input);
        if(matcher.find()){
            System.out.println("Entire match: " + matcher.group());
            System.out.println("Row span: " + matcher.group("rowspan"));
        }
    }

}

And you'll get:

Entire match: rowspan=66
Row span: 66
Raniz
  • 10,882
  • 1
  • 32
  • 64
0

Try

for(int i = 0; i <= matcher.groupCount(); i++){
    System.out.println(matcher.group(i));
}

matcher.groupCount() is 1, so if you use < you will iterate only on index 0.

Ward
  • 2,799
  • 20
  • 26