To find the words before and after a given word, you can use this regex:
(\w+)\W+problem\W+(\w+)
The capture groups are the words you're looking for.
In Java, that would be:
Pattern p = Pattern.compile("(\\w+)\\W+problem\\W+(\\w+)");
Matcher m = p.matcher("This problem sucks and is hard");
if (m.find())
System.out.printf("'%s', '%s'", m.group(1), m.group(2));
Output
'This', 'sucks'
If you want full Unicode support, add flag UNICODE_CHARACTER_CLASS
, or inline as (?U)
:
Pattern p = Pattern.compile("(?U)(\\w+)\\W+problema\\W+(\\w+)");
Matcher m = p.matcher("Questo problema è schifoso e dura");
if (m.find())
System.out.printf("'%s', '%s'", m.group(1), m.group(2));
Output
'Questo', 'è'
For finding multiple matches, use a while
loop:
Pattern p = Pattern.compile("(?U)(\\w+)\\W+problems\\W+(\\w+)");
Matcher m = p.matcher("Big problems or small problems, they are all just problems, man!");
while (m.find())
System.out.printf("'%s', '%s'%n", m.group(1), m.group(2));
Output
'Big', 'or'
'small', 'they'
'just', 'man'
Note: The use of \W+
allows symbols to occur between words, e.g. "No(!) problem here"
will still find "No"
and "here"
.
Also note that a number is considered a word: "I found 1 problem here"
returns "1"
and "here"
.