2

The Eula module from code.google.com works great but it doesn't display Unicode characters (e.g. ©) for some reason.

I know that AlertDialogs are perfectly capable of displaying Unicode characters, because I do so in other dialogs in my app.

The only difference I have been able to find between Eula's dialog and others is that the Eula.java dialog gets its string from a text file in the assets folder:

  private static CharSequence readEula(Activity activity) {
    BufferedReader in = null;
    try {
      in = new BufferedReader(new InputStreamReader(activity.getAssets().open(ASSET_EULA)));
      String line;
      StringBuilder buffer = new StringBuilder();
      while ((line = in.readLine()) != null)
        buffer.append(line).append('\n');
      return buffer;
    }
    catch (IOException e) {
      return "";
    }
    finally {
      closeStream(in);
    }
  }

That text file displays all Unicode characters in Notepad++, so I can only suspect InputStreamReader, BufferedReader or StringBuilder doing something to the string on its way from the assets file to the AlertDialog.

How can I make Eula.java display Unicode?

matt b
  • 138,234
  • 66
  • 282
  • 345
ef2011
  • 10,431
  • 12
  • 49
  • 67

2 Answers2

4

Give InputStreamReader the encoding of the source data; this class transcodes data to UTF-16 character data.

McDowell
  • 107,573
  • 31
  • 204
  • 267
  • I tried this. It still doesn't work. See my comments below to @Joachim Sauer. +1 anyway. – ef2011 May 17 '11 at 16:30
  • 1
    @ef2011 - if the text appears correctly in Notepad++, use the _Encoding_ menu to see what encoding is being used. If it is "ANSI", then it is a Windows encoding (which one depends on the OS.) In your hex editor the code point U+00A9 (©) has the following values in respective encodings: UTF-8 `C2 A9`; windows-1252 `A9`; ISO-8859-1 `A9`. – McDowell May 17 '11 at 20:25
  • You were right on the money. It turns out Notepad++ shows ANSI and once I changed the 2nd parameter to `InputStreamReader()` from "utf-8" to "ISO-8859-1" it works now. Wow. +1 again and accepting. Thank you! – ef2011 May 18 '11 at 00:08
  • @ef2011 - ISO-8859-1 is not the same as Windows [ANSI](http://blogs.msdn.com/b/oldnewthing/archive/2004/05/31/144893.aspx). If you are developing on Western European system, it will be windows-1252 (aka [Cp1252](http://download.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html)). See the [Microsoft documentation](http://msdn.microsoft.com/en-gb/goglobal/bb964654) for a list of "ANSI" encodings. In any case, I advise you to save the file as UTF-8 instead - this is the 21st century. – McDowell May 18 '11 at 00:45
2

In "normal" Java the single-argument InputStreamReader constructor uses the platform default encoding.

Android has defined this slightly different, saying

This constructor sets the character converter to the encoding specified in the "file.encoding" property and falls back to ISO 8859_1 (ISO-Latin-1) if the property doesn't exist.

So setting file.encoding to the encoding used in the asset (probably UTF-8) might just do the trick.

Alternatively (if you can edit it), just change the constructor call to the two-argument version and specify the correct encoding this way.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • I tried calling the two-argument version: With either "UTF16" or "UTF-16" it displays "Chinese" (original text in English!). With either "UTF8" or "UTF-8" it displays exactly as before: Correct English text with the © turned into a question mark symbol. What's happening? – ef2011 May 17 '11 at 16:15
  • Looking at the original file via a Hex editor, it looks like a UTF-8 file... I still don't understand why "UTF8" or "UTF-8" wouldn't work. – ef2011 May 17 '11 at 16:21