6

I've ran into an issue with encoding. Not sure if it's IDE related but I'm using NetBeans 7.4. I got this piece of code in my J2EE project:

    String test = "kukuřičné";
    System.out.println(new String(test.getBytes("UTF-8"))); // should display ok
    System.out.println(new String(test.getBytes("ISO-8859-1")));
    System.out.println(new String(test.getBytes("UTF-16")));
    System.out.println(new String(test.getBytes("US-ASCII")));
    System.out.println(new String(test.getBytes("windows-1250")));
    System.out.println(test); // should display ok

And when I run it, it never displays properly. UTF-8 should be able to print that out ok but it doesn't. Also when I tried:

    System.out.println(Charset.defaultCharset());

it returned windows-1252. The project is set to UTF-8 encoding. I've even tried resaving this specific java file in UTF-8 but it still doesn't display properly.

I've tried to create J2SE project on the other hand and when I run the same code it displays properly. Also the default charset returns UTF-8.

Both projects are set the UTF-8 encoding.

I want my J2EE project to act the same like the J2SE one. I didn't notice this issue until I updated my java to version 1.7.0_51-b13 but again I'm not sure if that is related.

I'm experiencing the same issue like this guy: http://forums.netbeans.org/ptopic37752.html

I've also tried setting the default encoding for the whole IDE: -J-Dfile.encoding=UTF-8 but it didn't help.

I've noticed an important fact. When I create a new web application it displays ok. When I create new Maven web application it displays incorrectly.

Found the same issue here: https://netbeans.org/bugzilla/show_bug.cgi?id=224526

I still haven't fixed it yet. There's still no solution working.

In my pom.xml the encoding is set properly, but it still shows windows-1252 in the end.

<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
Lenymm
  • 879
  • 1
  • 6
  • 27
  • In your example, you get the bytes in your specified encoding but you use the default encoding within the `String` constructor. – Sotirios Delimanolis Mar 17 '14 at 19:54
  • 1
    "The project is set to UTF-8 encoding" - I suspect that just means that the *source code* is interpreted as being UTF-8. It has nothing to do with the default encoding of the platform. – Jon Skeet Mar 17 '14 at 19:55
  • This is why you should always specify the encoding when writing to/reading from files, or "serializing" strings to bytes, or deserializing bytes to strings... – fge Mar 17 '14 at 19:58
  • Also, your _console_ may be limited in what you can display; the default Windows console in particular has poor Unicode support! – fge Mar 17 '14 at 20:00
  • J2EE project: System.out.println(test); prints kuku?i?né J2SE project prints kukuřičné – Lenymm Mar 17 '14 at 20:02
  • Do you output on the same console? – fge Mar 17 '14 at 20:08
  • Yes, both outputs are on the same console. – Lenymm Mar 17 '14 at 20:14

2 Answers2

13

I've spend few hours trying to find the best solution.

First of all this is an issue of maven which picks up platform encoding and uses it even though you've specified different encoding to be used. Maven doesn't seem to care (it even prints to console that it's using UTF-8 but when you run a file with the code above, it won't display properly).

I've managed to tackle this issue by setting a system variable:

JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

There should be another option instead of setting system variables and that is to set it as additional compiler parameter.

like javac -Dfile.encoding=UTF8

Lenymm
  • 879
  • 1
  • 6
  • 27
1

You are mixing a few concepts here:

  • the project encoding is the encoding used to save the Java source files (xxxx.java) - it has nothing to do with how your code executes
  • test.getBytes("UTF-8") returns a series of bytes representing your String in UTF-8 encoding
  • to recreate the original string, you need to explicitly give the encoding, unless it is the default of your machine: new String(test.getBytes("UTF-8"), StandardCharsets.UTF_8)
assylias
  • 321,522
  • 82
  • 660
  • 783
  • this: System.out.println(new String(test.getBytes("UTF-8"), StandardCharsets.UTF_8)); prints this: kuku?i?né in the J2EE project – Lenymm Mar 17 '14 at 20:01
  • 1
    @Lenymm: By using `new String(test.getBytes("UTF-8"), StandardCharsets.UTF_8)` you create the correct `String` but you are still writing to a console that doesn’t support all characters. – Holger Mar 17 '14 at 20:16
  • When I print it from J2SE project it displays on the same console and properly. – Lenymm Mar 17 '14 at 20:17