0

I have a file that has accented charaters: ÇÍââÇÍ

I need to change them into ISO-8859-15 encoding

The code:

    String fileName = "C:/Users/User/AppData/Local/Temp/temp6893820181068878551.txt";

    File file = new File(fileName);
    FileInputStream fin = new FileInputStream(file);

    FileChannel ch = fin.getChannel();
    int size = (int) ch.size();
    MappedByteBuffer buf = ch.map(FileChannel.MapMode.READ_ONLY, 0, size);

     byte[] utf8bytes = new byte[size];
    buf.get(utf8bytes);

    System.out.println(new String(utf8bytes));  

    System.out.println();
    System.out.println();

        Charset utf8charset = Charset.forName("UTF-8");
        Charset iso88591charset = Charset.forName("ISO-8859-15");

        String string = new String ( utf8bytes, utf8charset );
        System.out.println(string);
        System.out.println();
        System.out.println();

        byte[] iso88591bytes = string.getBytes(iso88591charset);

        for ( byte b : iso88591bytes )
            System.out.printf("%02x ", b);

        System.out.println();
        System.out.println();

        String string2 = new String ( iso88591bytes, iso88591charset );

        System.out.println(string2);

But I get as output:

ÇÍââÇÍ


??????


3f 3f 3f 3f 3f 3f 

??????
  • I dunno man. It worked for me. I can get bytes just like you did and they're not question marks. Are you **SURE** your input file has what you think in it? – markspace Sep 23 '14 at 16:12
  • what do you get? not 3fs? –  Sep 23 '14 at 16:14
  • well it prints what it has in it, the first line in the output –  Sep 23 '14 at 16:15
  • I get for the bytes: `0x199 0x205 0x226 0x226 0x199 0x205` and then the characters print fine: ÇÍââÇÍ – markspace Sep 23 '14 at 16:16
  • nop I still get 3f 3f 3f 3f 3f 3f ?????? –  Sep 23 '14 at 16:26
  • Something on your end. You may have edited the source to make your example more clear, and removed the line causing the problem. Check again carefully any lines you didn't show us. – markspace Sep 23 '14 at 16:27
  • are you sure you are reading them from a file? –  Sep 23 '14 at 16:28

2 Answers2

0

Try normalizing the string before calling .getBytes() on it, i.e. call Normalizer.normalize(string, Normalizer.Form.NFC)

The same looking accented characters can be represented in different unicode binary forms. Perhaps only the NFC form can be converted to iso-8859-15?

  • Really? When I try this code: String s = "ÇÍââÇÍ"; Charset iso88591charset = Charset.forName("ISO-8859-15"); byte[] bytes = Normalizer.normalize(s, Normalizer.Form.NFKC).getBytes(iso88591charset); for ( byte b : bytes ) System.out.printf("%02x ", b); It prints the following: c7 cd e2 e2 c7 cd – David Ekholm Sep 23 '14 at 17:22
0

I found the solution!

The problem was the file itself.

When writing to the original file, it must be in encoding UTF-8.