36

Let's say I have the following String:

name1=gil;name2=orit;

I want to find all matches of name=value and make sure that the whole string matches the pattern.

So I did the following:

  1. Ensure that the whole pattern matches what I want.

    Pattern p = Pattern.compile("^((\\w+)=(\\w+);)*$");
    Matcher m = p.matcher(line);
    if (!m.matches()) {
        return false;
    }
    
  2. Iterate over the pattern name=value

    Pattern p = Pattern.compile("(\\w+)=(\\w+);");
    Matcher m = p.matcher(line);
    while (m.find()) {
        map.put(m.group(1), m.group(2));
    }
    

Is there some way to do this with one regex?

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
gilsilas
  • 1,441
  • 2
  • 15
  • 24

3 Answers3

42

You can validate and iterate over matches with one regex by:

  • Ensuring there are no unmatched characters between matches (e.g. name1=x;;name2=y;) by putting a \G at the start of our regex, which mean "the end of the previous match".

  • Checking whether we've reached the end of the string on our last match by comparing the length of our string to Matcher.end(), which returns the offset after the last character matched.

Something like:

String line = "name1=gil;name2=orit;";
Pattern p = Pattern.compile("\\G(\\w+)=(\\w+);");
Matcher m = p.matcher(line);
int lastMatchPos = 0;
while (m.find()) {
   System.out.println(m.group(1));
   System.out.println(m.group(2));
   lastMatchPos = m.end();
}
if (lastMatchPos != line.length())
   System.out.println("Invalid string!");

Live demo.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
3

You have to enable multiline-mode for "^" and "$" to work as expected.

Pattern p = Pattern.compile("^(?:(\\w+)=(\\w+);)*$", Pattern.MULTILINE);
while (m.find()) {
    for (int i = 0; i < m.groupCount() - 2; i += 2) {
        map.put(m.group(i + 1), m.group(i + 2));
    }
}

Comments where right, you still have to iterate through matching groups for each line and make the outer group a non-capturing group (?:...).

weaselflink
  • 244
  • 2
  • 7
  • by default regex engine matches in multiline mode.did you wanted to use dotall option!.also given the example in question,your regex wont work.. – Anirudha May 29 '13 at 15:01
  • @Anirudh: No, by default, MULTILINE mode is not enabled in Java. DOTALL option will be useless here. – nhahtdh May 29 '13 at 15:04
1
String example = "name1=gil;name2=orit;";
Pattern pattern = Pattern.compile("((name[0-9]+?=(.+?);))+?");
Matcher matcher = pattern.matcher(example);
// verifies full match
if (matcher.matches()) {
    System.out.println("Whole String matched: " + matcher.group());
    // resets matcher
    matcher.reset();
    // iterates over found
    while (matcher.find()) {
        System.out.println("\tFound: " + matcher.group(2));
        System.out.println("\t--> name is: " + matcher.group(3));
    }
}

Output:

Whole String matched: name1=gil;name2=orit;
    Found: name1=gil;
    --> name is: gil
    Found: name2=orit;
    --> name is: orit
Mena
  • 47,782
  • 11
  • 87
  • 106
  • 1
    This really is 1 regex, but it requires 2 passes over the input string (once in `matches()` and once in the loop with `find()`) – nhahtdh May 29 '13 at 15:12
  • @nhahtdh You're right. But I wasn't aware of any "1 pass" limitation here. – Mena May 29 '13 at 15:25
  • 1
    Not really a limitation, just to say that it is not that much different from OP's current solution in term of the number of passes. – nhahtdh May 29 '13 at 15:39
  • @nhahtdh You have a point. What lead me to this is that the OP's solution didn't display nested passes I guess. – Mena May 29 '13 at 15:44