27

Can any one explain me difference between   and   ?

I have html data stored in database in binary form and space in that can be either of   or   or sometimes  .

Also issue is when I convert this HTML to plain text using JSoup lib it is converting it properly but if I use String.contains(my string) method of java. It looks like the HTML data which is having   is different from which is having  . String is not found in either vice versa.

Example:

HTML1 : This is my test string

HTML2 : This is my test string

If I convert it to plain text using JSoup. It returns

HTML 1 : This is my test string

HTML 2 : This is my test string

But still both string are not same. Why is it so?

Ketan Bhavsar
  • 5,338
  • 9
  • 38
  • 69

5 Answers5

46

  is the classic space, the one you get when you hit your spacebar, represented by his HTML entity equivalent.

  and   represents the non-breaking space, often used to prevent collapse of multiple spaces togethers by the browser :

"    " => " " (collapsed into only one space)

"    " => "    " (not collapsed)

If you are parsing a string containing both classic and non-breaking spaces, you can safely replace one by the other.

strnk
  • 2,013
  • 18
  • 21
6

 , is just a space character nothing more. Regular occurrence of this character will collapse to one space character at the end.

Where as &#160 and   both represent non-breaking space character and if they occur continuously one after another, they will be collapse or break to one space character.

Only, difference between them is that &#160 is the HTML number and   is a HTML name.

Basically all of these are HTML entities. You can learn and know about them, seeing the following links.

  1. Link 1
  2. Link 2
Starx
  • 77,474
  • 47
  • 185
  • 261
3

&#32 is the character for the space key.

&#160 and &nbsp are both the characters for Non breaking space.

If your data has come from different sources it may be possible that the space symbols have been encoded differently.

In direct comparison they will likely be shown as being different.

KingCronus
  • 4,509
  • 1
  • 24
  • 49
2

Java 8 onwards following should work:

string.replace("\\h", " "); or string.replaceAll("\\h", " ");

where \h is a horizontal whitespace character as described here

AP22
  • 301
  • 3
  • 7
0

To complete the other answers...

Apart from the non-breaking line and the collapsing of multiple spaces, the HTML rendering will not be exactly the same in fact even if most answers say the contrary and that is true in general.

Let's take an example :

<span>&#32;test</span> <br/>
<span>&#160;test</span>

The first span will not contain a space in the beginning of the string, while the second span will. It's surely part of the collapsing behavior : https://en.wikipedia.org/wiki/Non-breaking_space.

So in this case, if you need this first space, the difference is important.

Lafdoma
  • 41
  • 8