4

I want to eliminate all single letter words from a string in Java using pattern matching. I've coded as follows:

    String str = "P@";

    //remove single char words and extra white spaces
    inputStr = inputStr.replaceAll("\\b[\\w']{1}\\b", "").replaceAll("\\s+", " ").trim();

I'm expecting an output as P@ as the input is not a single letter word. But I'm getting output as @ because its eliminating P. So basically its considering only alphabetical characters for matching pattern. Whereas I want to match on the basis of length of the string entered.

Please help.

paras2682
  • 511
  • 2
  • 7
  • 14

4 Answers4

2

Try using this :

        String data = "asd df R# $R $$ $ 435 4ee 4";

    String replaceAll = data.replaceAll("(\\s.\\s)|(\\s.$)", " ");
    System.out.println(replaceAll);

Output is : asd df R# $R $$ 435 4ee

Ankur Shanbhag
  • 7,746
  • 2
  • 28
  • 38
0

Use this

str = str.replaceAll("(^.$|\\s.\\s|^.\\s|\\s.$)", "").replaceAll("\\s+", " ").trim();

The problem with your solution was that you were using \b which was expecting a character at the end and start of word so it was not working in your case.

/b

Matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters.

REFER FOR REGULAR EXPRESSION

Meherzad
  • 8,433
  • 1
  • 30
  • 40
  • This is not a complete solution. Its a solution only for the mentioned example. Because if I change my string to "P", this will retain it as it is. But as I said, I want to eliminate single letter words. – paras2682 Apr 02 '13 at 08:42
  • This solution does not take into account words that are not enclosed in whitespaces. (For instance words at the start/end of the string, or words followed by comma or period). – brimborium Apr 02 '13 at 08:47
  • @brimborium OP has mentioned that he needs the actual length of the string to be considered including other characters. – Meherzad Apr 02 '13 at 09:38
0

Try this regex:

\s([^\s]{1})\s

Should catch single character non-whitespace, delimited by a whitespace on either side. If you need to accept non-whitespace characters like ',' and '.' as delimiters you will need to add those.

Mikkel Løkke
  • 3,710
  • 23
  • 37
0

The test case is:

asd df R# $R $$ $ 435 4ee 4 hey buddy this is a test i@ wanted

"[!-~]?\\b[A-z]\\b[!-~]?"
"[!-~]?\\b[\\w]\\b[!-~]?"

the output for above code is:

asd df $$ $ 435 4ee 4 hey buddy this is test wanted
asd df $$ $ 435 4ee hey buddy this is test wanted

notice that in the second one the four is missing. The second regex gets rid of numbers didn't know if a single number counted or not

Lpc_dark
  • 2,834
  • 7
  • 32
  • 49