0

Given:

import java.util.regex.*;

class Regex2 {    
  public static void main (String args[]) {
    Pattern p = Pattern.compile(args[0]);
    Matcher m = p.matcher (args [1]);    
    boolean b = false;

    while (m. find()) {
       System.out.print(m.start()  + m.group());
    }
  }
}

the command line expression is :

java Regex2 "\d*" ab34ef

What is the result?

A. 234
B. 334
C. 2334
D 0123456
E. 01234456
F. 12334567
G. Compilation fails

The SCJP book explains regex, pattern and matchers so horribly it's unbelievable. Anyway, I pretty much understand most of the basics and have looked at the Sun/Oracle documentation about greedy and reluctant quantifiers. I understand the concepts but am a blurry about a few things:

What exactly is the physical symbol of a "greedy" quantifier? Is it simply a single *,? or + ? If so, can someone explain in detail how this answer turns out to be E according to the book? When I run it myself I get the answer: 2334!

Here we would be using a greedy quantifier correct? This would consume the entire string and then backtrack and look for zero or more digits in a row. Thus, if greedy, the 'full string' would contain 2 digits in a row and would execute .find() only once (ie. m.start = 0 , m.group = "ab34ef"), by that definition!

Thanks for the help guys.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
Nilay Panchal
  • 541
  • 6
  • 17

1 Answers1

0

These are the matches of \d* against "ab34ef":

  • index 0: zero-width;
  • index 1: zero-width;
  • index 2: "34";
  • index 4: zero-width;
  • index 5: zero-width;
  • index 6: zero-width.

This should explain your output. If the quantifier was reluctant, this would be the difference:

  • index 2: zero-width;
  • index 3: zero-width;

The reluctant quantifier grabs as little as allowed to make the entire expression match.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • Thanks Marko, that tells me how we get the output. Why isn't it used as a greedy quantifier though, is my question! – Nilay Panchal Jun 04 '13 at 09:56
  • I'm sorry I didn't understand the difference between the greedy and reluctant bit at all via this? Can you elaborate? Why didn't it take the entire input string as a whole first, since it's greedy, and therefore find 2 digits in the input string and return immediately? Or if it needs to be trailing digits, why not backtrack from the last two alphabets and zero in on the string ab34? – Nilay Panchal Jun 04 '13 at 10:07
  • The `*` quantifier matches *zero or more* characters. Therefore there is a match at each index. – Marko Topolnik Jun 04 '13 at 10:31
  • Yes but from what I ready, a single * is a greedy matcher. *+ is a reluctant one. So *+ should give me the output 01234456 - Which it does, I've tested it. However simply using a * should work differently ie. in a greedy manner! – Nilay Panchal Jun 04 '13 at 10:42
  • No, `*?` is the reluctant one. `*+` would be the *possessive* quantifier, which is a completely different thing. – Marko Topolnik Jun 04 '13 at 10:46
  • My apologies, that was a mistake. I meant *?. But that still doesn't clarify anything about greedy Quantifiers :( – Nilay Panchal Jun 04 '13 at 11:48
  • I tried to answer your question, where you state "I understand the concepts but am a blurry about a few things". What is the symbol for each quantifier is easily googlable stuff. For example, [my top Google hit](http://docs.oracle.com/javase/tutorial/essential/regex/quant.html) – Marko Topolnik Jun 04 '13 at 11:49
  • Okay let me try and rephrase. http://docs.oracle.com/javase/tutorial/essential/regex/quant.html. In that explanation the greedy quantifier "consumes" the entire string and then "regurgitates" tokens one by one if the string does not match. Similarly, here if we begin with the complete string ab34ef, this complete string already contains "one or more digits". So why would it not just execute once, start at 0 and print the entire string as part of the m.group()? Instead of going one by one, since the one by one method is used in reluctant Quantifiers. – Nilay Panchal Jun 04 '13 at 11:51