0

I'm trying to learn the Java Regular Expression. I want to match several capturing group (i.e. j(a(va))) against another string (i.e. this is java. this is ava, this is va). I was expecting the output to be:

I found the text "java" starting at index 8 and ending at index 12.
I found the text "ava" starting at index 21 and ending at index 24.    
I found the text "va" starting at index 34 and ending at index 36.
Number of group: 2

However, the IDE instead only output:

I found the text "java" starting at index 8 and ending at index 12.
Number of group: 2

Why this is the case? Is there something I am missing?

Original code:

BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("\nEnter your regex:");

        Pattern pattern
                = Pattern.compile(br.readLine());

        System.out.println("\nEnter input string to search:");
        Matcher matcher
                = pattern.matcher(br.readLine());

        boolean found = false;
        while (matcher.find()) {
            System.out.format("I found the text"
                    + " \"%s\" starting at "
                    + "index %d and ending at index %d.%n",
                    matcher.group(),
                    matcher.start(),
                    matcher.end());
            found = true;
            System.out.println("Number of group: " + matcher.groupCount());
        }
        if (!found) {
            System.out.println("No match found.");
        }

After running the code above, I have entered the following input:

Enter your regex:
j(a(va))

Enter input string to search:
this is java. this is ava, this is va

And the IDE outputs:

I found the text "java" starting at index 8 and ending at index 12.
Number of group: 2
halfer
  • 19,824
  • 17
  • 99
  • 186
Thor
  • 9,638
  • 15
  • 62
  • 137
  • 1
    try using https://regex101.com/ – Scary Wombat Mar 03 '16 at 00:45
  • 1
    I think you misunderstand what capturing groups do. They don't make the other parts of the regexp optional, so your regexp only matches the whole string `java`. – Barmar Mar 03 '16 at 00:47
  • 1
    Please do not post questions reading from the from `System.in` and doing something with the result, since this means you can a) easily debug the code to identify an error with reading from `System.in` or b) hardcode the strings. In both cases that means the code isn't a minimal example and/or the source of the error can easily be narrowed down. Also it means it's more work to reproduce the problem. – fabian Mar 03 '16 at 00:49

2 Answers2

1

Your regexp only matches the whole string java, it doesn't match ava or va. When it matches java, it will set capture group 1 to ava and capture group 2 to va, but it doesn't match those strings on their own. The regexp that would produce the result you want is:

j?(a?(va))

The ? makes the preceding item optional, so it will match the later items without these prefixes.

DEMO

Barmar
  • 741,623
  • 53
  • 500
  • 612
1

You need regex (j?(a?(va)))

Pattern p = Pattern.compile("(j?(a?(va)))");
Matcher m = p.matcher("this is java. this is ava, this is va");

while( m.find() )
{
    String group = m.group();
    int start = m.start();
    int end = m.end();
    System.out.format("I found the text"
                  + " \"%s\" starting at "
                 + "index %d and ending at index %d.%n",
                  group,
                  start,
                   end);



}

You can see demo here

SomeDude
  • 13,876
  • 5
  • 21
  • 44