9

I want to print russian and german characters in windows console. So I wrote a small test program to understand how well it works:

PrintStream ps = new PrintStream(System.out, false, "UTF-8");
ps.println("öäüß гджщ");

Then I started cmd.exe, changed its font to Lucida Console, which supports Unicode, changed code page to Unicode with "chcp 65001" and executed my program.

The german and russian characters were printed but there was a little more text than I expected (underlined with red): enter image description here

But the text is printed correctly in the Eclipse console. Is there a way to print it correctly in windows console? I use Windows 7.

I've just solved the problem with JNI, but it is still interesting whether it is doable with pure java.

ka3ak
  • 2,435
  • 2
  • 30
  • 57
  • 1
    I looked at this a [while back](http://illegalargumentexception.blogspot.co.uk/2009/04/i18n-unicode-at-windows-command-prompt.html) but never did find out why it happens. I expect the reason is buried down in JRE native code. – McDowell Dec 06 '12 at 17:03
  • 2
    afaik the behavior depends on windows version. From utf8everywhere.org: "On Windows 7 the console displays that character as two invalid characters, regardless of the font used" – Pavel Radzivilovsky Dec 06 '12 at 21:44
  • @McDowell I think you are right, the problem must be in JRE. – ka3ak Dec 07 '12 at 06:03
  • @Pavel Radzivilovsky Your diagnose is very precise. Maybe there is no other way to solve the problem as to use JNI or JNA. – ka3ak Dec 07 '12 at 06:05
  • try SetConsoleOutputCP(CP_UTF8) and use UTF-8 for output. This should work. – Pavel Radzivilovsky Dec 07 '12 at 06:17
  • @ka3ak It's been over 2 years, but while reading a Java article I read the following: "`System.console().printf(...)` has better support for special characters than the `System.out.println(...)` method." For a similar post see: http://stackoverflow.com/questions/4005378/console-writeline-and-system-out-println – bvdb Apr 11 '15 at 11:15

3 Answers3

1

Every time you open or write a file, a certain encoding will be applied. But sometimes we forget that also our IDE (Eclipse in your case) has an encoding.

When you are typing a certain text between quotes, it is displayed and typed in a certain encoding, the encoding of your IDE. Your assumption is that the encoding of your output stream (UTF-8) will also guarantee that the text is displayed with that specific encoding. However, I think also here again the encoding of your IDE is applied.

I would propose to double check your encoding of eclipse. Perhaps this can solve your problem. Certainly worth a try, isn't it ? :)

For a global encoding setting add the following code to the eclipse.ini file

-Dfile.encoding=UTF-8 

EDIT:

I would just like to add the following. I performed the following steps as an experiment.

  1. I opened Notepad++ and created a new file
  2. I modified the encoding setting to UTF-8
  3. I copied your Russian text and pasted it in my new text file and saved it.
  4. Next I opened my windows console ("cmd")
  5. I executed the "chcp 65001" command.
  6. Next I printed the content of the file in my console: "type file.txt"
  7. Everything shows correctly.

This does not confirm much, but it does confirm the fact that DOS can do the job if the content is foreseen in the right encoding.

EDIT2:

@ka3ak It's been over 2 years, but while reading a book about Java I/O I stumbled upon the following.

System.console().printf(...) has better support for special characters than the System.out.println(...) method.

Since the PrintStream just wraps around the System.out stream, I guess you have the same limitations. I am wondering if this could have solved the problem. If it still matters, please give it a try. :)

Other posts on stackoverflow report similar things: console.writeline and System.out.println

Community
  • 1
  • 1
bvdb
  • 22,839
  • 10
  • 110
  • 123
  • I also experimented with the steps you did and I got the same result. The option "-Dfile.encoding=UTF-8" didn't help. I got the same wrong text. – ka3ak Dec 07 '12 at 06:01
  • @Bvdb - the issue is with `System.out` and how it writes to STDOUT. By default it uses the system's legacy ANSI encoding. Attempting to use UTF-8 results in adverse behaviour. _I would also note that not all of the JRE's libraries will respect the `file.encoding` property as it is not a standard property._ – McDowell Dec 07 '12 at 08:55
  • @McDowell I just read somewhere that the `System.console()` has better support for special characters than the `System.out` stream. In a way this is kind of what you were saying 2 years ago. :) – bvdb Apr 11 '15 at 11:27
0

After reading the answers and recommendations here I concluded that there must be a problem with JRE. Maybe this problem only exists in Windows 7 (unfortunately I don't have other Windows systems to experiment with).

The solution is to use JNI or if you want a simpler solution then use JNA. I've found a useful JNA example, which solves my problem, here https://stackoverflow.com/a/8921509/971355

Community
  • 1
  • 1
ka3ak
  • 2,435
  • 2
  • 30
  • 57
0

This is due to the a in ¼-hearted implementation of cp65001 in Windows. See the complete disclosure in @eryksun’s answer.

Short summary: only 7-bit (sic!) input/output works reliably in cp65001 (unless a CRTL makes workarounds) up to Windows 7. The problem with output is fixed in Windows 8. The problem with input is present in Windows 10.

Ilya Zakharevich
  • 1,210
  • 1
  • 9
  • 6