0

I'm trying to process strings with repeated chars in order to find the correct word in a dictionary.

The approach I must use is to find words with 3 or more consecutive letters and remove them into 2 consecutive letters.

Then I'll look for in the dictionary if this word exists. If the word doesn't exist, then I must remove the two consecutive letters into 1 letter only.

Example:

gooooooood -> good (this existis)
awesooooome -> awesoome (this doesn't exist) -> awesome (this exists)
aaawwwesooooooommmme -> aawwesoomme (this doesn't exist) -> awesome (this exists)

I'm working with JAVA and i'm already using this regular expression to get the words with 3 or more repeated letters in a string:

Pattern p = Pattern.compile("\\b\\w*(\\w)\\1{2}\\w*");
Rafael
  • 7,002
  • 5
  • 43
  • 52

2 Answers2

1

You can use this regex ("pure version"):

(\b\w*?)(\w)\2{2,}(\w*)

String version:

"(\\b\\w*?)(\\w)\\2{2,}(\\w*)"

You should use replaceAll(regex, "$1$2$2$3")

Explanation

(\b\w*?) // capture group 1 is lazy
(\w)     // capture group 2 captures the first occurrence of the char
\2{2,}   // char may occur 2 or more times...
(\w*)    // capture group 3

Note that the $number in the replacement refers to the contents of the corresponding capture group.

Laurel
  • 5,965
  • 14
  • 31
  • 57
1

You can also do it like this:

Pattern pattern = Pattern.compile("(\\w)\\1{2,}");
System.out.println(pattern.matcher("gooooooood").replaceAll("$1$1"));
System.out.println(pattern.matcher("awesooooome").replaceAll("$1$1"));
System.out.println(pattern.matcher("aaawwwesooooooommmme").replaceAll("$1$1"));

Output:

good
awesoome
aawwesoomme

And for the second step, here is how you can do it:

Pattern pattern2 = Pattern.compile("(\\w)\\1");
System.out.println(pattern2.matcher("awesoome").replaceAll("$1"));
System.out.println(pattern2.matcher("aawwesoomme").replaceAll("$1"));

Output:

awesome
awesome
Nicolas Filotto
  • 43,537
  • 11
  • 94
  • 122