4

Please take a look at the following code:

public static void main(String[] args) {
    String s = "a < b > c > d";
    String regex = "(\\w\\s*[<>]\\s*\\w)";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(s);
    int i = 0;
    while (m.find()) System.out.println(m.group(i++));
}

The output of the above program is: a < b, c > d

But I actually expect a < b, b > c, c > d.

Anything wrong with my regexp here?

ekad
  • 14,436
  • 26
  • 44
  • 46
Gelin Luo
  • 14,035
  • 27
  • 86
  • 139

3 Answers3

3

You're right in your thinking that b > c matches the regex because it does.

But when you call Matcher::find(), it returns the next substring of the input which matches the regex and is disjoint from previous find() matches. Since "b > c" begins with the 'b' which was part of the "a > b" match returned by the previous invocation, it won't be returned by find().

asdfjklqwer
  • 3,536
  • 21
  • 19
2

Try this.

    String s = "a < b > c > d";
    String regex = "(?=(\\w{1}\\s{1}[<>]{1}\\s{1}\\w{1})).";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(s);
    while(m.find()) {
        System.out.println(m.group(1));
    }

Updated(Based on green's solution):

    String s = " something.js > /some/path/to/x19-v1.0.js < y < z < a > b > c > d";
    String regex = "(?=[\\s,;]+|(?<![\\w\\/\\-\\.])([\\w\\/\\-\\.]+\\s*[<>]\\s*[\\w\\/\\-\\.]+))";

    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(s);
    while (m.find()) {
        String d = m.group(1);
        if(d != null) {
            System.out.println(d);
        }
    }
Prince John Wesley
  • 62,492
  • 12
  • 87
  • 94
  • That works for this specific string "a < b > c > d", but when I change it to "abc > x > y", it fails. If I change the regex to "(?=(\\w+\\s*[<>]{1}\\s*\\w+)).", it output: abc < x, bc < x, c < x, x > y, where "bc < x" and "c < x" is not expected. – Gelin Luo Apr 02 '11 at 06:10
  • By adding some boundary matchers I finally make it work! See my answer to this question. John got the credit however ;) – Gelin Luo Apr 04 '11 at 01:22
  • Fantastic! A very small update to the regex in order to support CDN js path (which include 'http:'): (?=[\\s,;]+|(?<![\\w\\/\\-\\.:])([\\w\\/\\-\\.]+\\s*[<>]\\s*[\\w\\/\\-\\.:]+)) – Gelin Luo Apr 05 '11 at 05:20
  • oops, looks like it does not work for this string "http://ahost.com/something.js > http://zbc-1.com.au/some/path/to/x19-v1.0.js < y < z < a > b > c > d" – Gelin Luo Apr 05 '11 at 05:25
  • This one works with the bug string finally: "(?=[\\s,;]+|(?<![\\w\\/\\-\\.:])([\\w\\/\\-\\.:]+\\s*[<>]\\s*[\\w\\/\\-\\.:]+))" – Gelin Luo Apr 05 '11 at 05:27
  • while(m.find()) is not a looping condition...at least link is telling me its not. It should be an if(m.find()) – JPM Aug 05 '16 at 19:49
1

Based on John's solution and adding some boundary matchers, this works finally.

    String s = " something.js > /some/path/to/x19-v1.0.js < y < z < a > b > c > d";
    String regex = "(?=[\\s,;]+([\\w\\/\\-\\.]+\\s*[<>]\\s*[\\w\\/\\-\\.]+)[\\s,;$]*).";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(s);
    while(m.find()) {
        System.out.println(m.group(1));
    }
Gelin Luo
  • 14,035
  • 27
  • 86
  • 139