If you're working with one word at a time, try this:
boolean isMatch = s.matches(
"(?i)^(?:J()|U()|G(?!.*G)()|[GLERS]()|\\w){4,}+$\\1\\2\\3\\4");
If you're searching for matches in a longer string:
Pattern p = Pattern.compile(
"(?i)\\b(?:J()|U()|G(?!.*G)()|[GLERS]()|\\w){4,}+\\b\\1\\2\\3\\4");
Matcher m = p.matcher(s);
while (m.find()) {
String foundString = m.group();
}
Each time one of the first four alternatives - J()
, U()
, G()
or [GLERS]()
- matches something, the empty group following it "captures" nothing (i.e., an empty string). When the end of the string is reached, each of the backreferences - \1
, \2
, etc. - tries to match the same thing its corresponding group matched: nothing again.
Obviously, that will always succeed; you can always match noting. The trick is that the backreference won't even try to match if its corresponding group didn't participate in the match. That is, if there's no j
in the target string, the ()
in the J()
alternative never gets involved. When the regex engine processes the \1
backreference later, it immediately reports failure because it knows that group hasn't participated in the match.
In this way, the empty groups act like a check boxes, and the backreferences make sure all the boxes have been checked. There's one wrinkle, though. Both the G()
and [GLERS]()
alternatives can match g
; how do you make sure they both participate in the match when you need them to? The first regex I tried,
"(?i)^(?:J()|U()|G()|[GLERS]()|\\w){4,}+$\\1\\2\\3\\4"
...failed to match the word "jugg" because the G()
alternative was consuming both g
's; [GLERS]()
never got a chance to participate. So I added the negative lookahead - (?!.*G)
- and now it only matches the last g
. If I had three alternatives that could match a g
, I would have to add (?!.*G.*G)
to the first one and (?!.*G)
to the second. But realistically, I probably would have switched to a different approach (probably one not involving regexes) well before I reached that point. ;)