0

I have the following code. As far as I can see, the program should print 0123445. Instead, it prints 01234456.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex2 {

public static void main(String[] args) {
    Pattern p = Pattern.compile("\\d*");
    Matcher m = p.matcher("ab34ef");
    boolean b = false;
    while(b=m.find()){
        System.out.print(m.start() + m.group());
        }
    System.out.println();
    }

}

I think the following should happen- Since the search pattern is for a \d*,

  1. It finds a hit at position 0, but since the hit is not a digit, it just prints 0
  2. It finds a hit at position 1, but again, not a digit, prints 0
  3. Finds a hit at position 2 and since we are looking for \d*, the hit is 34, and so it prints 234.
  4. Moves to position 4, finds a hit, but since hit is not a digit, it just prints 4.
  5. Moves to position 5, finds a hit, but since hit is not a digit, it just prints 5.

At this point, as far as I can see, it should be done. But for some reason, the program also returns a 6.

Much appreciate it if someone can explain.

user3516726
  • 626
  • 6
  • 15
  • 2
    Change the printing to: `System.out.println("pos: "+m.start() + "; match:" + m.group());` and you'll understand what's going on – Nir Alfasi Sep 07 '14 at 23:25
  • 2
    The empty pattern (which is what you have effectively for positions that are not a digit) matches both "before" and "after" a character, so it matches at the end of the string as well. – Jim Garrison Sep 07 '14 at 23:27

1 Answers1

2

The \d* matches zero(!) or more digits, that's why it returns an empty string as a match at 0 and 1, it the matches 34 at position 2 and an empty string again at position 4 and 5. At that point what is left to match against is an empty string. And this empty string also matches \d* (because an empty string contains zero digits), that's why there is another match at position 6.

To contrast this try using \d+ (which matches one or more digits) as the pattern and see what happens then.

AVee
  • 3,348
  • 17
  • 17