The problem in not in Java. When converted in UTF-8, the thai string "สวย" gives the bytes '0xe0', '0xb8', '0xaa', '0xe0', '0xb8', '0xa7', '0xe0', '0xb8', '0xa2'
In Latin1, 0xe0 is à
, 0xaa is ª
, oxa2 is ¢
, and the others have no representation giving the ?
characters.
That means that the println
has done its part of the job but that the thing that should have displayed the characters (terminal screen or IDE) cannot or was not instructed to process UTF8.
Unfortunately, the Windows console is not really Unicode friendly. Recent versions (>= Win 7) support a so called utf-8 code page (chcp 65001
) which correctly processes UTF-8 byte strings provided its underlying charset can display the characters. For example after typing chcp 65001
my French system successfully displays all accented characters (éèùïêçàâ...) when they are UTF-8 encoded, but cannot display your example Thai string.
If you need a truely UTF-8 capable console on Windows, you can try the excellent ConEmu.