0

Here is an example:

The two (Senior Officer Stuart & Officer Jess) were intercepted by Officer George.

Now, let's say I have two ranks "Officer" and "Senior Officer" and want to replace the name after them with a general token "PERSON". As you can see there are three names that come after a rank Stuart, Jess, George. I don't know why my regex solution fails to capture all of them. Here is my code:

    public static void main(String[] args) {
    String input = "The two (Senior Officer Stuart & Officer Jess) were intercepted by Officer George.";
    ArrayList<String> ranks = new ArrayList<String>();
    ranks.add("Senior Officer");
    ranks.add("Officer");
    for (String rank : ranks) {
        Pattern pattern = Pattern.compile(".*" + rank + " ([a-zA-Z]*?) .*");
        Matcher m = pattern.matcher(input);
        if (m.find()) {
            System.out.println(rank);
            System.out.println(m.group(1));
        }
    }
}

and here is its output:

Senior Officer
Stuart
Officer
Stuart

which captures Stuart twice (via Senior Officer and Officer), but ignores Jess and George. I am expecting to get this as the output:

Senior Officer
Stuart
Officer
Stuart
Officer
Jess
Officer
George
user3639557
  • 4,791
  • 6
  • 30
  • 55

2 Answers2

2

This will be sufficient

for (String rank : ranks) {
    Pattern pattern = Pattern.compile("\\b" + rank + "\\s+([a-zA-Z]*)");
    Matcher m = pattern.matcher(input);
    while (m.find()) {
        System.out.println(rank);
        System.out.println(m.group(1));
    }
}

Ideone Demo

Regex Breakdown (as per comments)

Officer #Match Officer literally
 ( #Capturing group
  (?: #Non-capturing group
    \s #Match space
     (?!(?:Senior\s+)?Officer) #Negative lookahead assures that its impossible to match the word Senior(which is optional) and Officer literally
    [A-Z][a-zA-Z]* #Match capital letter followed by combination of capital and small letter
  )* #Repeat the previous step any number of time till one of the condition of first letter being capital fails or word Officer is found
 )
rock321987
  • 10,942
  • 1
  • 30
  • 43
0

The for you are using finds ONLY the first match of each rank. First of all, you need a while clause inside the for.

 for (String rank : ranks) {
        Pattern pattern = Pattern.compile(rank + " [A-z]+");
        Matcher m = pattern.matcher(input);
        while (m.find()) {
            System.out.println(rank);
            System.out.println(m.group(1));
        }
    }

However, this does not solve the problem of finding the "senior officer" rank twice: once when you search for "senior officer" and once when you search for "officer". I am not sure how you want to handle this issue. If you want Stuart to appear twice, then this code is good enough. If you want Stuart to be detected only once, you need to work on your regEx.

P.S. use an online tool to test the regex before coding it. It saves up a lot of time.