0

I'm usingStringUtils.countMatches to count word frequencies, is there a way to search text for words starting-with some characters?

Example:

searching for art in "artificial art in my apartment" will return 3! I need it to return 2 for words starting with art only.

My solution was to replace \r and \n in the text with a space and modify the code to be:

text = text.replaceAll("(\r\n|\n)"," ").toLowerCase();
searchWord = " "+searchWord.toLowerCase();
StringUtils.countMatches(text, searchWord);

I also tried the following Regex:

patternString = "\\b(" + searchWord.toLowerCase().trim() + "([a-zA-Z]*))";
pattern = Pattern.compile(patternString);
matcher = pattern.matcher(text.toLowerCase());

Questions: -Does my first solution make sense or is there a better way to do this?

-Is my second solution faster? as I'm working with large text files and decent number of search-words.

Thanks

PhDeveloper
  • 335
  • 1
  • 4
  • 14
  • Why not use regular expressions? See [`java.util.Pattern`](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html). – 0xbe5077ed Jun 18 '14 at 15:00

2 Answers2

3
text = text.replaceAll("(\r\n|\n)"," ").toLowerCase();
searchWord = " "+searchWord.toLowerCase();
String[] words = text.split(" ");
int count = 0;
for(String word : words)
   if(searchWord.length() < word.length())
        if(word.substring(word.length).equals(searchWord))
            count++;

Loops provide the same effect.

  • 1
    If you wanted to exclusively use RegExp, then @Duncan's answer works. –  Jun 18 '14 at 15:14
  • thanks for the answer. I changed the if statement to be <= to include "art" by itself and found no need for the extra space in the 2nd line. also changed the substring to start from 0 to word.length and it works. Cheers – PhDeveloper Jun 18 '14 at 19:07
2

Use a regular expression to count examples of art.... The pattern to use is:

\b<search-word>

Here, \b matches a word boundary. Of course, the \b needs to be escaped when listed in the pattern string. Below is an example:

String input = "artificial art in my apartment";
Matcher matcher = Pattern.compile("\\bart").matcher(input);

int count = 0;
while (matcher.find()) {
    count++;
}

System.out.println(count);

Output: 2

Duncan Jones
  • 67,400
  • 29
  • 193
  • 254