1

I'm trying to capture assignment operations from a text file using 'java.util.regex.Pattern'. I've been very frustrated trying to fix my regular expression to actually recognize what I am looking for. I've simplified the problem as much as I can and found an issue with picking up white space.

This post proved helpful, and sheds light on issues dealing with the whitespace character set, but does not answer the question of why the following is not working:

Pattern p = Pattern.compile("adfa =");
Scanner sc = new Scanner("adfa =");

if(sc.hasNext(p))
{
    String s = sc.next(p);
    System.out.println(">" + s + "<");
}
else
    System.out.println(":(");

If I try this:

Pattern p = Pattern.compile("\\w+ *=");

The following string is picked up:

"adfa="

But not:

"adfa ="

Simply by making the following change:

Pattern p = Pattern.compile("adfa=");
Scanner sc = new Scanner("adfa=");

All works as intended! Can anyone shed any light on what is going wrong?

Community
  • 1
  • 1
Daeden
  • 481
  • 1
  • 6
  • 20

2 Answers2

5

From the documentation, Scanner#hasNext(Pattern): -

Returns true if the next complete token matches the specified pattern. A complete token is prefixed and postfixed by input that matches the delimiter pattern.

Now, since the default delimiter pattern for Scanner is \p{javaWhitespace}+. Find it out by using Scanner#delimiter() method: -

Scanner sc = new Scanner("abdc =");
System.out.println(sc.delimiter());  // Prints \p{javaWhitespace}+

So, when your Scanner encounters a whitespace in your string. It assumes that the token has ended. And hence stops there, and tries to match the read token with your pattern. And of course it fails, and hence sc.hasNext(p) return false. This is the problem.

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • EDIT: Just read that the default is indeed any whitespace. Thanks! Not sure how I managed to not read that and assume the delimiter was '\n' or EOF. – Daeden Jan 22 '13 at 23:22
  • @Daeden.. Try printing the value of `sc.delimiter()`. You would get `\p{javaWhitespace}+`. I hope that makes it clear. – Rohit Jain Jan 22 '13 at 23:24
2

From Scanner.hasNext(Pattern) javadoc: Returns true if the next complete token matches the specified pattern. A complete token is prefixed and postfixed by input that matches the delimiter pattern.

In Scanner, the withespace is the default delimiter, so in your example the Scanner tries to match the token "adfa" with the regex, which doesn't match. If you change the delimiter to something else, like a line feed:

sc.useDelimiter("\n");

Your regex should work.

EDIT: My answer a bit late!

German
  • 3,560
  • 7
  • 32
  • 34