0

I have this line of code in java:

new BufferedWriter(new OutputStreamWriter(new FileOutputStream(name, append), "UTF-8"));

This writer does not write an UTF-8 file, because when I open it in notepad++ it says that the encoding is: ANSI as UTF-8. I need it to be pure UTF-8.

Do you have any suggestions?

skaffman
  • 398,947
  • 96
  • 818
  • 769
yelo3
  • 5,613
  • 5
  • 26
  • 23
  • 4
    If your file only contains ASCII characters, then there will be no difference. i.e. either it's saved in UTF-8 or ASCII, the file contents will be exactly the same, unless you put in the BOM bytes (0xEF,0xBB,0xBF). – shinkou Jul 22 '11 at 08:52
  • 1
    I wouldn't just go by what Notepad++ says - have you looked at the contents of the file? – Jon Skeet Jul 22 '11 at 08:52
  • See: http://stackoverflow.com/questions/1380690/what-is-ansi-as-utf-8-and-how-can-i-make-fputcsv-generate-utf-8-w-bom – dacwe Jul 22 '11 at 08:54

3 Answers3

11

notepad++ (and any other tool) can only guess the encoding, it's not written anywhere in your file (or in some metadata).

And if the text you've written doesn't contain any characters outside the ASCII range (i.e. no character with a Unicode codepoint > 127), then a file with ANSI encoding is indistinguishable from one in UTF-8 encoding.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
6

Notepad++ uses a heuristic algorithm to detect the encoding, i.e. the detected encoding can differ from the true on (it's a guess).

In this case, Notepad++ is correct, but misunderlabeling the encoding. ANSI as UTF-8 is pure UTF-8, just without a BOM.

Community
  • 1
  • 1
phihag
  • 278,196
  • 72
  • 453
  • 469
-1

Most likely Notepad++ needs the BOM at the beginning of your file. Write the bytes EF BB BF first to your file, then the encoded characters.

Mot
  • 28,248
  • 23
  • 84
  • 121