How to trim no-break space in Java?

Question

I've input an input file which I need to process and discard all the white-spaces, including non-breaking space U+00A0 aka   (You can produce it in Notepad by pressing Alt and then typing 0 1 6 0 from the keyboard's numeric pad.) or any other form of white space. I have tried String.trim() but it doesn't trim U+00A0.

Do I need to explicitly check for U+00A0 and then trim() or is there an easy way to trim all kinds of white-spaces in Java?

yup, replace worked. :) Didn't thought of it earlier :| What is the difference between "all" & _all_? — Abhishek, Feb 03 '15 at 09:51
If the question is about removing _all_ no-break spaces inside a String then the question is wrong and the accepted answer is perfect. If the question is about trimming no_break spaces then the accepted answer is wrong. — ForguesR, Jun 27 '17 at 14:28
@ForguesR Can you please explain how is the question or the answer is wrong? — Abhishek, Jun 28 '17 at 16:40

score 77 · Accepted Answer · edited Jun 20 '20 at 09:12

77

While   is a non breaking space (a space that does not want to be treated as whitespace), you can trim a string while preserving every   within the string with a simple regex:

string.replaceAll("(^\\h*)|(\\h*$)","")

\h is a horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]

If you are using a pre JDK8 Version, you need to explicitly use the list of chars instead of \h.

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 03 '15 at 09:44

Cfx

2,272
2
15
21

1

This is the cleanest and most general solution so far. Worth mentioning that `\h` is only available since Java 8 but in earlier versions, you can use the explicit range given in your answer. – 5gon12eder Feb 03 '15 at 09:55
That's brilliant! Exactly a one-liner which will take care of all kinds of spaces. – Abhishek Feb 03 '15 at 09:55
One thing that might be helpful to know is that these have a Unicode Classification of Space Separator. I like this page as a reference to what's included, as the Unicode official stuff is a bit dry: [Space Separators](http://www.fontspace.com/unicode/category/space-separator) – Steve Jun 30 '16 at 10:33
1

Why do you think `\u00A0` is not removed? Please see the [javadoc](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). It states that this char is included in `\h` and therefore removed. – Cfx Mar 14 '17 at 10:12
1

@Roland `Cfx` answered it for me :) – Abhishek Mar 14 '17 at 13:35
1

@Abhishek My bad, I didn't realize that `\xA0` == `'\u00A0'`. – Roland Mar 14 '17 at 14:00
And what about \r\n, \r and \n ? it is not escapted with your answer – Mattew Eon Aug 09 '21 at 08:26
1

Well, when \h is used for horizontal whitespace, one could guess that \v would be used for vertical whitespace ([\n\x0B\f\r\x85\u2028\u2029]). Or you would just look in the [javadoc](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). **;-)** – Cfx Aug 19 '21 at 08:26

Rob Audenaerde · Answer 2 · 2015-02-03T09:53:56.410

48

U+0160 is not whitespace, so it won't be trimmed. But you can simply replace() that characters with a space, and then call trim(), so you keep the spaces that are 'inside' your string.

string = string.replace('\u00A0',' ').trim()

There are three non-breaking whitespace characters that are excluded from the Character.isWhitespace() method : \u00A0, \u2007 and, \u202F, so you probably want to replace those too.

edited Feb 03 '15 at 09:53

answered Feb 03 '15 at 09:36

Rob Audenaerde

19,195
10
76
121

1

It worked!! Thanks :) I assume, I need to handle all the whitespaces (http://en.wikipedia.org/wiki/Whitespace_character) explicitly & one-by-one, right? – Abhishek Feb 03 '15 at 09:47
1

`trim()` will take care of all the characters that are listed as java whitespace, so you don't need to add all whitespace characters. See here: http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-char- – Rob Audenaerde Feb 03 '15 at 09:50
But this will change the "inner" NBSPs to normal spaces, might not be what you want. "&nbsp foo&nbsp35 nbsp" will become "foo 35" not "foo&nbsp" which would be expected of trim. – Viktor Mellgren May 04 '21 at 09:07
1

@ViktorMellgren yes, but OP asked for: I've input an input file which I need to process and *discard all the white-spaces* – Rob Audenaerde May 05 '21 at 07:38

score 14 · Answer 3 · answered Feb 24 '20 at 06:42

14

You can try this:

string.replaceAll("\\p{Z}","");

From https://www.regular-expressions.info/unicode.html:

\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.

answered Feb 24 '20 at 06:42

logbasex

1,688
1
16
22

score 4 · Answer 4 · answered Jun 27 '17 at 15:53

If you happen to use Apache Commons Lang then you can use strip and add all the characters you want.

final String STRIPPED_CHARS = " \t\u00A0\u1680\u180e\u2000\u200a\u202f\u205f\u3000";

String s = "\u3000 \tThis str contains a non-breaking\u00A0space and a\ttab. ";
s = StringUtils.strip(s, STRIPPED_CHARS);  
System.out.println(s);  // Gives : "This str contains a non-breaking space and a    tab."

score 3 · Answer 5 · edited Jun 05 '19 at 17:51

3

You could do it with a guava CharMatcher, for example:

CharMatcher.anyOf("\r\n\t \u00A0").trimFrom(input);
CharMatcher.whitespace().trimFrom(input);

See also this nice reference on whitespaces definition

edited Jun 05 '19 at 17:51

Stephen M -on strike-

1,037
8
28

answered Feb 03 '15 at 09:39

2

There are far more whitespace characters that you have put in your list. – Rob Audenaerde Feb 03 '15 at 09:40
3

The link to whitespace definition is dead. – izogfif Jul 11 '18 at 13:24

How to trim no-break space in Java?

5 Answers5

Linked