It does not read unicode code points, but UTF-16 code units. There is no difference for code points below 0xFFFF, but code points above 0xFFFF are represented by 2 code units each. This is because you cannot have value above 0xFFFF in 16-bit.
So is in this case:
byte[] a = {-16, -96, -128, -128}; //UTF-8 for U+20000
ByteArrayInputStream is = new ByteArrayInputStream(a);
InputStreamReader r = new InputStreamReader(is, Charset.forName("UTF-8"));
int whatIsThis = r.read();
int whatIsThis2 = r.read();
System.out.println(whatIsThis); //55360 not a valid stand alone code point
System.out.println(whatIsThis2); //56320 not a valid stand alone code point
With the surrogate values, we put them together to get 0x20000
:
((55360 - 0xD800) * 0x400) + (56320 - 0xDC00) + 0x10000 == 0x20000
More about how UTF-16 works: http://en.wikipedia.org/wiki/UTF-16