0

Say I have a string

String str = "This problem sucks and is hard"

and I wanted to get the words before and after "problem", so "This" and "sucks". Is regex the best way to accomplish this (keeping in mind that I'm a beginner with regex), or does Java have some kind of library (i.e. StringUtils) that can accomplish this for me?

dumbPotato21
  • 5,669
  • 5
  • 21
  • 34
nitsua
  • 23
  • 2
  • 4

3 Answers3

1

To find the words before and after a given word, you can use this regex:

(\w+)\W+problem\W+(\w+)

The capture groups are the words you're looking for.

In Java, that would be:

Pattern p = Pattern.compile("(\\w+)\\W+problem\\W+(\\w+)");

Matcher m = p.matcher("This problem sucks and is hard");
if (m.find())
    System.out.printf("'%s', '%s'", m.group(1), m.group(2));

Output

'This', 'sucks'


If you want full Unicode support, add flag UNICODE_CHARACTER_CLASS, or inline as (?U):

Pattern p = Pattern.compile("(?U)(\\w+)\\W+problema\\W+(\\w+)");

Matcher m = p.matcher("Questo problema è schifoso e dura");
if (m.find())
    System.out.printf("'%s', '%s'", m.group(1), m.group(2));

Output

'Questo', 'è'


For finding multiple matches, use a while loop:

Pattern p = Pattern.compile("(?U)(\\w+)\\W+problems\\W+(\\w+)");

Matcher m = p.matcher("Big problems or small problems, they are all just problems, man!");
while (m.find())
    System.out.printf("'%s', '%s'%n", m.group(1), m.group(2));

Output

'Big', 'or'
'small', 'they'
'just', 'man'

Note: The use of \W+ allows symbols to occur between words, e.g. "No(!) problem here" will still find "No" and "here".

Also note that a number is considered a word: "I found 1 problem here" returns "1" and "here".

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • Great, I think this is gonna be what I want. What would be the best way to apply this to a case where there are multiple occurrences of "problem"? – nitsua May 26 '17 at 23:31
  • You can change `if` to `while`, and it will find all matches. – Andreas May 27 '17 at 00:21
0

There is a StringUtils library by apache which does have the methods to substring before and after the string. Additionally there is java's own substring which you can play with to get what you need.

Apache StringUtils library API: https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html

The methods that you might need - substringBefore() and substringBefore().

https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html#substringBefore(java.lang.String,%20java.lang.String)

Check this out if you want to explore java's own api's Java: Getting a substring from a string starting after a particular character

Grinish Nepal
  • 3,037
  • 3
  • 30
  • 49
0

A bit verbose but this gets the job done accurately and quickly:

import java.io.*;
import java.util.*;
public class HelloWorld{

public static void main(String []args){
    String EntireString="Hello World this is a test";
    String SearchWord="World";
    System.out.println(getPreviousWordFromString(EntireString,SearchWord));
}
 
public static String getPreviousWordFromString(String EntireString, String SearchWord) {
    List<Integer> IndicesOfWords = new ArrayList();

    boolean isWord = false;

    int indexOfSearchWord=-1;

    if(EntireString.indexOf(SearchWord)!=-1) {
        indexOfSearchWord = EntireString.indexOf(SearchWord)-1;
    } else {
        System.out.println("ERROR: SearchWord passed (2nd arg) does not exist in string EntireString. EntireString: "+EntireString+" SearchWord: "+SearchWord);
        return "";
    }
    
    if(EntireString.indexOf(SearchWord)==0) {
        System.out.println("ERROR: The search word passed is the first word in the search string, so there are no words before it.");
        return "";
    }

    for (int i = 0; i < EntireString.length(); i++) {
        if (Character.isLetter(EntireString.charAt(i)) && i != indexOfSearchWord) {
            isWord = true;                                    
        } else if (!Character.isLetter(EntireString.charAt(i)) && isWord) {
            IndicesOfWords.add(i);
            isWord = false;
        } else if (Character.isLetter(EntireString.charAt(i)) && i == indexOfSearchWord) {
            IndicesOfWords.add(i);
        }
    }
    
    if(IndicesOfWords.size()>0) {
        boolean isFirstWordAWord=true;
        for (int i = 0; i < IndicesOfWords.get(0); i++) {
            if(!Character.isLetter(EntireString.charAt(i))) {
                isFirstWordAWord=false;
            }
        }
        if(isFirstWordAWord==true) {
            String firstWord = EntireString.substring(0,IndicesOfWords.get(0));
            IndicesOfWords.add(0,0);
        }
    } else {
        return "";
    }
    
    String ResultingWord = "";


    for (int i = IndicesOfWords.size()-1; i >= 0; i--) {

        if (EntireString.substring(IndicesOfWords.get(i)).contains(SearchWord)) { 
            if (i > 0) {
                ResultingWord=EntireString.substring(IndicesOfWords.get(i-1),IndicesOfWords.get(i));
                break;
            }
            if (i==0) {
                ResultingWord=EntireString.substring(IndicesOfWords.get(0),IndicesOfWords.get(1));
            }
        }
    }

    return ResultingWord;
}
Collin
  • 394
  • 5
  • 14