2

A little confusion about the java native2ascii tool. Definition for the tool in Java 6:

Converts a file with native-encoded characters (characters which are non-Latin 1 and non-Unicode) to one with Unicode-encoded characters.

Then why does it also transform characters belonging to Latin 1 table (such as é) to unicode encoded representation (\u00e9) ???

Latin 1 (iso 8859-1) table is available here for instance http://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout

That implies that i cannot directly work with properties files for some european languages such as french.

To clarify my question:

native2ascii shouldnt convert latin1 characters (as per its description). é is a valid latin1 character. Therefore why is it converted ?

Thomas Shields
  • 8,874
  • 5
  • 42
  • 77
Kemoda
  • 284
  • 1
  • 9

2 Answers2

1

You can work with properties files with french and other characters. Properties accepts \uxxxx sequences. You can work with national characters directly since Properties has load(Reader reader) method. Then the file can be in any encoding, you will provide the reader that decodes the file correctly, eg new InputStreamReader(new FileInputStream(1.properities), Charset.forName("ISO-8859-1"));

I also agree that native2ascii should not convert é because it's a legal latin-1 char and docs says latin-1 chars are not converted.

Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
  • Yes but i cannot provide a reader, i have to use the default one. That means that i cannot directly type é in the file. I have to either put \uxxxx and use for instance maven to use native2ascii while generating the product – Kemoda Aug 27 '13 at 12:27
  • Further my question is : why is native2ascii converts latin1 characters when it shouldnt (based on the tool description). Is the documentation erroneous ? – Kemoda Aug 27 '13 at 12:29
  • I also think that docs is wrong and they should say non-ASCII chars, but note that Properties.load(InputStream) really reads file assuming it's ISO88591 so its supposed to read ISO88591 props files OK without \uxxxx – Evgeniy Dorofeev Aug 27 '13 at 12:37
  • change your answer to reflect that the documentation is outdated and i will accept your answer – Kemoda Sep 03 '13 at 09:45
1

The source of confusion might be that the documentation changed with Java version 7.

In Java 6 the documentation for solaris and unix ( http://docs.oracle.com/javase/6/docs/technotes/tools/solaris/native2ascii.html ) says: "The Java compiler and other Java tools can only process files which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters. native2ascii converts files which contain other character encodings into files containing Latin-1 and/or Unicode-encoded charaters."

I think it clearly means that the output is Latin-1, and characters not in Latin-1 will be Unicode-encoded in the output.

I checked Openjdk 6 on Ubuntu and the native2ascii there does not conform to the documentation, it outputs Latin-1 characters as Unicode-encoded. So either the documentation or the native2ascii tool can be considered incorrect in that case.

However in Java 7 and Java 8 the documentation ( http://docs.oracle.com/javase/7/docs/technotes/tools/solaris/native2ascii.html https://docs.oracle.com/javase/8/docs/technotes/tools/unix/native2ascii.html ) says: "native2ascii converts files that are encoded to any character encoding that is supported by the Java runtime environment to files encoded in ASCII, using Unicode escapes ("\uxxxx" notation) for all characters that are not part of the ASCII character set."

I checked Openjdk 8 native2ascii on Ubuntu and found that it works accordingly, it converts Latin-1 characters to Unicode-encoded.

Note that the 7/8 documentation mentions also "This process is required for properties files containing characters not in ISO-8859-1 character sets."

I think it clearly means that properties files containing Latin-1 (aka ISO-8859-1) encoded characters are still valid.

riskop
  • 1,693
  • 1
  • 16
  • 34