2

I'm writing an ANT task in Java.

In my build.xml I specify parameters, which should be read from my java class. Problems occur, when I use special characters, like german umlauts (Ö,Ä,Ü) in these parameters. In my java task they appear as ?-characters (using System.out.print from within eclipse).

All my files are encoded as UTF-8. and my build.xml has the corresponding declaration:

<?xml version="1.0" encoding="UTF-8" ?>

For the details of writing the task: I do it according to http://ant.apache.org/manual/develop.html (especially Point 5 nested elements). I have nested elements in my task like:

<parameter name="test"   value="ÖÄÜtest"/>

and a java method:

public void addConfiguredParameter(Parameter prop) {
    System.out.println(prop.getValue());
    //prints ???test
}

to read the parameter values.

räph
  • 3,634
  • 9
  • 34
  • 41
  • 1
    What do you mean by "they aren't recognized"? Where do the ? glyphs show up? Chances are that is an imperfection of whatever editor you view the result through, not in Java. – Kilian Foth Jun 14 '10 at 12:21
  • glyphs show up in printouts (I updated my question to clarify). I'm using eclipse. – räph Jun 14 '10 at 12:29
  • does your console also show output in utf8 format? – Inv3r53 Jun 14 '10 at 13:44

3 Answers3

2

There are several transcoding operations going on here:

  1. Saving the XML as UTF-8 by your editor
    • Check that the characters are encoded correctly using a hex editor
  2. The parsing of the XML by Ant from UTF-8 to UTF-16 strings
    • A fault here is very unlikely
  3. Transcoding by the System.out PrintStream from UTF-16 strings to the platform encoding
    • Check that the encoding used supports the characters
  4. Decoding of the bytes received by the Eclipse console into UTF-16 strings for display
    • Check that the encoding used by the console matches that of the PrintStream

Encoded as UTF-8, you would expect the following encoded values in your XML file:

Grapheme  UTF-8 encoded bytes
Ö         c3 96
Ä         c3 84
Ü         c3 9c
McDowell
  • 107,573
  • 31
  • 204
  • 267
0

Have you tried starting java with the following parameter?

-Dfile.encoding=UTF-8
hooray
  • 266
  • 6
  • 14
  • Actually I don't have any run configuration for which I could specify the encoding. I just execute my ant script from within eclipse, which calls my java task! – räph Jun 14 '10 at 13:03
  • Java does not support configuration of default transcoding operations from the command line. From Sun's bug database: _The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution._ http://bugs.sun.com/view_bug.do?bug_id=4163515 – McDowell Jun 14 '10 at 14:25
0

The problem somehow vanished into thin air and was probably already fixed by switching everything to utf-8, but maybe eclipse didn't react so fast. Anyway I couldn't reproduce the error.

A problem which remained was, that when I referred to a build.properties file (which uses the characters mentioned) from my build.xml - then my java task still didn't get the characters right. But I could circumvent this by using \u and the hex representation of the letters - although that's not really convenient!

räph
  • 3,634
  • 9
  • 34
  • 41
  • Properties files (so long as they aren't XML) are restricted to `ISO 8859-1`, so you must use Unicode escape sequences for characters not in this range. – McDowell Jun 15 '10 at 08:06