Convert code point(unicode) to UTF-16 and print in java

Question

I am based on this article https://kishuagarwal.github.io/unicode.html

I took for example: UTF-16 code point 0x1F9F0

In hexa:

0x1F9F0

In binary:

0001 1111 1001 1111 0000

Fallowing the explanation from article, should i have some thing like that:

1101 10XX XXXX XXXX 1101 11XX XXXX XXXX

Which populate from the bits from do code point, give me

binary:

1101 1000 0111 1110 1101 1101 1111 0000

hexa:

\uD87E \uDDF0

But in this page correct value is:

hexa:

\uD83E\uDDF0

binary:

1101 1000 0011 1110 1101 1101 1111 0000

So...

      my hexa: \uD87E \uDDF0
 correct hexa: \uD83E \uDDF0

I have single bit misplaced, and I cant figure out why...

It's difficult to see what you've done wrong if you don't show us your code. — Dawood ibn Kareem, Jul 15 '19 at 05:08
@guy, @dawood-ibn-kareem there is no code. IF you wanna code, shoud be: `static void main(String[] args){ System.out.println(" \uD83E\uDDF0"); }` — Bruno Rozendo, Jul 15 '19 at 12:29

score 1 · Answer 1 · edited Jul 15 '19 at 17:55

Converting 0x1F9F0 (0001 1111 1001 1111 0000)

From the article you posted, we follow the part:

For the unicode codepoints from U+010000 to U+10FFFF, ...

and the first step, which you probably missed:

Firstly 0x010000 is subtracted from the code point, giving us a 20-bit number in the range 0x000000 to 0x0FFFFF.

that is, 0x0F9F0 (0000 1111 1001 1111 0000)

UTF-16 divides these range into two buckets 0xD800...0xDBFF and 0xDC00...0xDFFF (let’s call them A and B ) where each bucket has 10 free bits and 6 fixed bits(shown in grey in the image).

or, as you already posted: 1101 10XX XXXX XXXX and 1101 11XX XXXX XXXX

The 20-bit number that we got above after subracting, is now divided into two parts of 10-bit each. The first 10-bits are used to the fill the 10 free bits of A while the remaining 10-bits are used to fill the 10 free bits of B.

resulting in 1101 1000 0011 1110 and 1101 1101 1111 00000 or 0xD83E 0xDDF0 - as expected.

True, my mistake was I read wrong `For the unicode codepoints from U+010000 to U+10FFFF` insted I read: `For the unicode codepoints from U+010000 to U+10FFF`, — Bruno Rozendo, Jul 15 '19 at 12:43

Convert code point(unicode) to UTF-16 and print in java

1 Answers1