1

Unable to print Thai string value in Java console

public static void main(String [] args){
   String engParam = "Beautiful";
   String thaiParam = "สวย";
   System.out.println("Output :" + engParam + ":::" + thaiParam);}

Output is showing like:

Output :Beautiful:::à?ªà??à?¢

I think System.out.println will not be able to print the UTF-8 characters with default console settings. Is there any other way available to resolve this issue? help needed.

AritraDB
  • 315
  • 4
  • 13
  • 1
    Most likely there is a problem with your console - which console are you using? IDE build-in, windows command prompt or something else? Try playing with its settings. – Amongalen Dec 13 '19 at 12:17
  • Windows command prompt – AritraDB Dec 18 '19 at 02:51
  • Windows command prompt/ PowerShell. Let me clarify the whole scenario. Yes, I can do that/ print that with Eclipse IDE with some IDE specific configuration changes. but I can't use the IDE in a cloud server/ deployment env (though create a WAR file and deploy it in a tomcat server is a good option). That's why I'm trying with a standalone program and use the Windows Powershell/ Windows command prompt. – AritraDB Dec 18 '19 at 03:36

6 Answers6

1

One cannot easily change a Windows' console encoding. So write to a .txt file. For Windows to detect the Unicode UTF-8 encoding, you could write at the beginning an invisible BOM character: "\ufeff".

String text = "\uFEFF" + "Output :" + engParam + ":::" + thaiParam;
Path path = Paths.get("temp.txt");
Files.write(path, Collections.singletonList(text)); // Writes in UTF-8
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Tried. Within the temp.txt, I found "Output :John_help:::สวย" – AritraDB Dec 13 '19 at 12:46
  • Opened with Notepad? – Joop Eggen Dec 13 '19 at 12:50
  • The java console seems okay, in general the java compiler **javac** could use an other encoding then the editor. But as Serge Ballesta investigated, UTF-8 seems to be used (fine). Try also a programmers editor like the java console or NotePad++. – Joop Eggen Dec 13 '19 at 12:56
  • "One cannot easily change a Windows' console encoding", Can `chcp` do this, for example `chcp 65001`? – Geno Chen Dec 15 '19 at 03:00
  • @GenoChen you might consider giving an answer to show how to change the code page _and_ restore it. I am not that familiar with Windows, and though I knew of chcp I found my solution less invasive. – Joop Eggen Dec 16 '19 at 10:01
  • @JoopEggen OK. My answer has posted a moment ago. – Geno Chen Dec 16 '19 at 10:32
  • @JoopEggen Yes, I opened it on notepad/ notepad++. Ok, let me clarify the whole scenario. Yes, I can do that/ print that with Eclipse IDE with some IDE specific configuration changes. but I can't use the IDE in a cloud server/ deployment env. That's why I'm trying with a standalone program and use the windows command prompt. – AritraDB Dec 18 '19 at 03:00
  • Bad luck you did not pick Linux. I doubt there is a Linux subsystem? Because then its command prompt should work. – Joop Eggen Dec 18 '19 at 09:14
1

The problem in not in Java. When converted in UTF-8, the thai string "สวย" gives the bytes '0xe0', '0xb8', '0xaa', '0xe0', '0xb8', '0xa7', '0xe0', '0xb8', '0xa2'

In Latin1, 0xe0 is à, 0xaa is ª, oxa2 is ¢, and the others have no representation giving the ? characters.

That means that the println has done its part of the job but that the thing that should have displayed the characters (terminal screen or IDE) cannot or was not instructed to process UTF8.


Unfortunately, the Windows console is not really Unicode friendly. Recent versions (>= Win 7) support a so called utf-8 code page (chcp 65001) which correctly processes UTF-8 byte strings provided its underlying charset can display the characters. For example after typing chcp 65001 my French system successfully displays all accented characters (éèùïêçàâ...) when they are UTF-8 encoded, but cannot display your example Thai string.

If you need a truely UTF-8 capable console on Windows, you can try the excellent ConEmu.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • I do not know what is *a java console*. I know what a Windows console is or a Linux `xterm` (or alii) terminal emulator. In both case you can configure them to process UTF-8. I do not know for Mac or for IDEs. – Serge Ballesta Dec 14 '19 at 09:29
  • yup. sorry for the miscommunication, I mean to windows command prompt. Ok, let me clarify the whole scenario. Yes, I can do that/ print that with Eclipse IDE with some IDE specific configuration changes. but I can't use the IDE in a cloud server/ deployment env. That's why I'm trying with a standalone program and use the windows command prompt. – AritraDB Dec 18 '19 at 02:56
1

You don't specify your environment, but this approach worked for me on Windows 10 from within my IDE, and also from the Command window:

  • First, use a font that supports Thai characters. But also make sure that the font you choose can be set in the Command window, and not just within your IDE. Some can (e.g. Courier Mono Thai), and some can't (e.g. Angsana New). You can mess with the Registry to add font selections, but Courier Mono Thai was available by default, so I used that one.
  • Once you have identified a font that you can set in the Command window, you can probably use that in your IDE as well if its default font(s) can't handle Thai characters.

Here are the steps to get things working:

  • Download font Courier Mono Thai. You can download it from several web sites but I got it from here.
  • Install the downloaded font. On Windows 10 all you have to do is select it (Courier_MonoThai.ttf) in File Explorer, right click, and select Install from the context menu.
  • Once the font is installed, make it the default font in the Command window. Open a Command window, click the icon in the top right corner, select Properties and then select Courier Mono Thai as your font:

    CmdFont

  • Run the application in your IDE. If the source code or the output don't render the Thai characters correctly, change the font. I used Courier Mono Thai in NetBeans, and everything looked good: NetBeansWindow
  • Finally run in the Command window. The Thai characters probably won't render correctly. To fix that just change the code page to the one that supports Thai (chcp 874) before running your application: cmdRun

These instructions are specific to Windows 10. If you are running in a different environment update your question with full details of your platform and your IDE.


Updated 12/15/19 to provide an alternative approach:

Instead of using Code page 874 (Thai) from the Command window, you could do this instead:

  • Create a PrintStream that uses the UTF-8 charset, and write the output using that PrintStream.
  • In the Command window, use code page 65001 (UTF-8).

Here's the code:

package thaicharacters;

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.nio.charset.StandardCharsets;

public class ThaiCharacters {

public static void main(String[] args) throws UnsupportedEncodingException {

    String engParam = "Beautiful";
    String thaiParam = "สวย";

    // Write the output to a UTF-8 PrintStream:
    PrintStream ps = new PrintStream(System.out, true, StandardCharsets.UTF_8.name());
    ps.println("UTF-8: " + engParam + ":::" + thaiParam);
}
}

And here's the output in the Command window, showing that:

  • The Thai characters are not rendered correctly when using the default code page (437), or the Thai code page (874).
  • The Thai characters render correctly using the UTF-8 code page (65001):

chcp65001

skomisa
  • 16,436
  • 7
  • 61
  • 102
  • Appreciate your effort. – AritraDB Dec 18 '19 at 03:27
  • Looks like it's not. Still from windows powershell I'm getting "**UTF-8: Beautiful:::???**". with (chcp 65001) – AritraDB Dec 18 '19 at 03:40
  • OK. [1] What font are you using in PowerShell? [2] What about if you run from the Command window (cmd.exe) when using a font that can render Thai characters? – skomisa Dec 18 '19 at 03:54
  • I just ran from PowerShell and it worked for me using font _Courier Mono Thai_. Perhaps open a new PowerShell window after changing the font? Also, note that if you use `chcp 65001` in the PowerShell or Command window, then you must use `PrintStream ps = new PrintStream(System.out, true, StandardCharsets.UTF_8.name());` in your code. And if you use `chcp 874` then you must use `System.out.println("Default: " + engParam + ":::" + thaiParam);` in your code, as shown in my answer. – skomisa Dec 18 '19 at 04:11
  • With **chcp 65001**, it works for me with the font you have mentioned. Thanks a lot :). Appreciate your work. – AritraDB Dec 19 '19 at 03:31
0

This answer to a similar question might be your case, if you are using eclipse (but it can be almost the same in IntelliJ)

Jimi
  • 1,605
  • 1
  • 16
  • 33
  • Yes, I have already gone through that. In this case, I have to make some changes in eclipse configuration, but at the time of plain standalone java program, it's not working. – AritraDB Dec 14 '19 at 08:00
  • 1
    An answer that just links to another answer should be posted as a comment instead. – skomisa Dec 15 '19 at 02:56
0

This answer assumes that:

  1. You are using Windows.
  2. The "Java console" you said is an invoke of Command Prompt (You may know nothing about this if you are using an IDE, but cmd and IntelliJ IDEA surely does, though I don't know whether Eclipse or other does).
  3. My guess was right :-)

Go to Registry Editor (regedit), locate at "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Command Processor", create a REG_EXPAND_SZ named AutoRun with value chcp 65001. Then try again (no reboot required).

Actually, this is an example of creating and using an "initscript" for cmd.exe. It may be the way for us to change the de facto "default" console encoding to UTF-8 (codepage 65001) without changing too much of the system configurations.

To restore it, simply delete this specified value.

Geno Chen
  • 4,916
  • 6
  • 21
  • 39
  • Did you try your solution? I don't think it will work for two reasons. The `PrintStream` used by `println()` in the OP's example will not be writing UTF-8, so setting the code page to 65001 (UTF-8) won't help. Also, even if that was fixed, unless the Command window is using a font that can render Thai characters they will not be rendered correctly. Also, making the Command window's code page default to 65001 is nothing to do with the question. – skomisa Dec 16 '19 at 16:50
  • @skomisa No, I didn't try my solution, because I don't have such an environment mentioned by the OP (though OP's environment is not so clear). But I have tried, and successfully solved my encoding issue while running a program in Run window of Intelli-Haskell plugin of IntelliJ IDEA. This made me think, if the 2nd assume in my answer satisfies, the solution satisfies. – Geno Chen Dec 16 '19 at 17:07
  • You are right that the environment details are unclear. But regardless, `PrintStream()` will always use the "[platform's default character encoding if not specified](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/PrintStream.html)", and since your answer assumed a Windows platform, that default is unlikely to be UTF-8. And because you are setting the code page to 65001, the `PrintStream` should be writing UTF-8 characters, but it isn't. Also, the font being used must support Thai characters, which is unlikely for the Command window in most environments. – skomisa Dec 16 '19 at 17:32
  • @skomisa For the two (actually three) reasons in your comment, I **think** (because I don't have the environment) 1. We have enough evidence to ensure that this string will be outputted as UTF-8 (One from the analyze of messy output given by Serge Ballesta's answer, one from the Java spec (JLS 3.1)... OK, Java spec says Java string uses UTF-16). – Geno Chen Dec 16 '19 at 17:39
  • @skomisa 2. Yes, font issue can be the issue, I have tested that only the font "SimSun-ExtB" supports Thai text in my list of font in console settings. 3. Tested via a `.bat` file, I found that the relationship between encoding of codepage and source surely affects the output. (Chinese characters, (saving GBK, use `chcp 936`) -> normal output, (saving GBK, use `chcp 65001`) -> messy output, (saving UTF-8, use `chcp 65001`) -> normal, (saving UTF-8, use `chcp 936`) -> messy) – Geno Chen Dec 16 '19 at 17:39
0

Set environment variable java_tool_options=-Dfile.encoding=utf8 in cmd use chcp 65001

virxen
  • 408
  • 4
  • 12