It is faster and simpler for handling supplementary characters (by not handling them).
Java represent characters as 16 bit char
s, but unicode has evolved to contain more than 64K characters. So some characters, the supplementary characters, has to be encoded in 2 char
s (surrogate pair) in Java.
Strict UTF-8 requires that the encoder converts surrogate pairs into characters then encode characters into bytes. The decoder needs to split supplementary characters back to surrogate pairs.
chars -> character -> bytes -> character -> chars
Since both ends are Java, we can take some shortcut and encode directly on the char
level
char -> bytes -> char
neither encoder nor decoder need to worry about surrogate pairs.