0

I am really struggling with this question:

import java.util.regex.*;    
class Regex2 {    
    public static void main(String[] args) {    
        Pattern p = Pattern.compile(args[0]);    
        Matcher m = p.matcher(args[1]);    
        boolean b = false;    
        while(b = m.find()) {    
            System.out.print(m.start() + m.group());    
        }    
    }
}  

When the above program is run with the following command:

java Regex2 "\d*" ab34ef 

It outputs 01234456. I don't really understand this output. Consider the following indexes for each of the characters:

a b 3 4 e f
^ ^ ^ ^ ^ ^
0 1 2 3 4 5

Shouldn't the output have been 0123445?

I have been reading around and it looks like the RegEx engine will also read the end of the string but I just don't understand. Would appreciate if someone can provide a step by step guide as to how it is getting that result. i.e. how it is finding each of the numbers.

VLAZ
  • 26,331
  • 9
  • 49
  • 67
ziggy
  • 15,677
  • 67
  • 194
  • 287

1 Answers1

8

It is helpful to change

System.out.print(m.start() + m.group());

to

System.out.println(m.start() + ": " + m.group());

This way the output is much clearer:

0: 
1: 
2: 34
4: 
5: 
6: 

You can see that it matched at 7 different positions: at position 2 it matched string "34" and at any other position it matched an empty string. Empty string matches at the end as well, which is why you see "6" at the end of your output.

Note that if you run your program like this:

java Regex2 "\d+" ab34ef

it will only output

2: 34
Adam Zalcman
  • 26,643
  • 4
  • 71
  • 92
  • @fge Nope. Shell does **not** turn "\d+" into "d+". See bash manpage. – Adam Zalcman Dec 18 '11 at 18:42
  • Yes indeed, sorry for the confusion... – fge Dec 18 '11 at 18:48
  • @AdamZalcman, can you elaborate on the **Empty string** at the end. Is there a hidden termination character? Like C's `\0` why is there an empty char at the end of the string? Does the bash shell inject a `\n` or `\r` I am confused as to why there are 6 indices. – sdc Sep 29 '16 at 08:54