2

I have some string data like

&#55357 ;&#56842 ;

These are surrogate pairs in UTF 16 in decimal format.

How can I convert them to Unicode Code Points in Java, so that my client can understand the Unicode decimal html entity without the surrogate pair?

Example: &#128522 ; - Get this response for the above string

KristofMols
  • 3,487
  • 2
  • 38
  • 48

1 Answers1

2

Assuming you already parsed the string to get the 2 numbers, just create a String from those two char values:

String s = new String(new char[] { 55357, 56842 });
System.out.println(s);

Output


To get the code point of that:

s.codePointAt(0) // returns 128522

You don't have to create a string though:

Character.toCodePoint((char) 55357, (char) 56842) // returns 128522
Andreas
  • 154,647
  • 11
  • 152
  • 247