0

I have a requirement where I need to remove unwanted characters for String in java. For example, Input String is

Income ......................4,456
liability........................56,445.99

I want the output as

Income 4,456
liability 56,445.99

What is the best approach to write this in java. I am parsing large documents for this hence it should be performance optimized.

athenatechie
  • 699
  • 2
  • 8
  • 15

3 Answers3

0

For this particular example, I might use the following replacement:

String input = "Income ......................4,456";
input = input.replaceAll("(\\w+)\\s*\\.+(.*)", "$1 $2");
System.out.println(input);

Here is an explanation of the pattern being used:

(\\w+)   match AND capture one or more word characters
\\s*     match zero or more whitespace characters
\\.+     match one or more literal dots
(.*)     match AND capture the rest of the line

The two quantities in parentheses are known as capture groups. The regex engine remembers what these were while matching, and makes them available, in order, as $1 and $2 to use in the replacement string.

Output:

Income 4,456

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • What exactly is this regular expression doing? I mean I need to understand how exactly is working. – athenatechie Jun 13 '17 at 02:34
  • The expression worked for the example i illustrated. However, I ran into another String as shown below Fire........................ .................................. ...........598,368..............598,368...... – athenatechie Jun 13 '17 at 14:04
  • Update your question and show us data covering _every_ edge case. Going back and forth like this with everyone isn't going to get you anywhere. – Tim Biegeleisen Jun 13 '17 at 14:52
0

You can do this replace with this line of code:

System.out.println("asdfadf ..........34,4234.34".replaceAll("[ ]*\\.{2,}"," "));
German
  • 1,449
  • 12
  • 13
-1

Best way to do that is like:

String result = yourString.replaceAll("[-+.^:,]","");

That will replace this special character with nothing.

Fady Saad
  • 1,169
  • 8
  • 13
  • Your solution has a problem, namely that it would also strip the punctuation from the numbers on the right side of each line. – Tim Biegeleisen Jun 13 '17 at 02:27
  • It also replaces the decimal point in the second value (the one just before `99`), which is not desired. – Ken White Jun 13 '17 at 02:27
  • I agree, this is not right solution. I thaught about it the same way. Let me try other solutions mentioned above. Thanks for prompt responses. – athenatechie Jun 13 '17 at 02:33