0

Normally my program should put out all CP1252 code as chars:

System.out.println("actual file.encoding: "+System.getProperty("file.encoding")); // CP1252


for (int i = 0; i < 500; i++) {
    System.out.println("Nr.: "+i+ " Symbol: "+(char)i");
}

But output is: (snippet of the whole output!)

Nr.: 124 Symbol: |
Nr.: 125 Symbol: }
Nr.: 126 Symbol: ~
Nr.: 127 Symbol: 
Nr.: 128 Symbol: ?
Nr.: 129 Symbol: ?
Nr.: 130 Symbol: ?
Nr.: 131 Symbol: ?
Nr.: 132 Symbol: ?
Nr.: 133 Symbol: ?
Nr.: 134 Symbol: ?
Nr.: 135 Symbol: ?

But in https://en.wikipedia.org/wiki/Windows-1252 it is written that 134 is: †

Why doesn't it show † ?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
kelloaos
  • 41
  • 6
  • You are missing that char and string within Java are always UTF-16/UCS-2. The default encoding only applies when converting from bytes to string (or char) and vice versa (without specifying an explicit encoding). That doesn't apply here. – Mark Rotteveel Jul 19 '18 at 11:55

2 Answers2

1

The byte value 134 (or 0x86) in CP1252 is indeed dagger, but char in Java is always UTF-16 (Unicode) and in UTF-16 U+0080 - U+00FF (integer codepoints 128 - 159) are non-graphic characters while U+2020 is the character corresponding to CP1252 byte 0x86.

Use System.out.write(/*int 0-255 only*/i) to output an already-encoded byte. Or less convenient in this case but preferable in others, put the bytes in an array byte[] and use System.out.write(byte[]).

dave_thompson_085
  • 34,712
  • 6
  • 50
  • 70
0

ah now it works... Someone knows which charsets are involed here ? i will find out later but now it is to confusing. Thank you: It works with the Unicode U+2020 (hex) which correspond to 8224 :

fW.write("Omg it writes † : ");
        fW.write(13);
        fW.write(10);
        fW.write(0x2020);
        fW.write(8224);
        fW.write(13);
        fW.write(10);

Output:

    Begin:
Omg it writes † : 
††
kelloaos
  • 41
  • 6
  • i assume that my java programm takes the i - my iterator (i hope the word is used correctly..) and converts it into a symbol of utf-16 of 2 bytes. then it is written into the textfile which has a different charset and doesnt understand all utf-16 symbols. maybe it is cp 1252 ..? – kelloaos Jul 19 '18 at 11:54