BufferedWriter outputting strange characters when saved to new file

Question

I'm using the following code to process a large text file, line by line. The problem is that I'm using a language other than English, Croatian to be precise. Many of the characters appear as � in the output file. How can I resolve this?

The file is in ANSI, but this does not seem to be an encoding type compatiable with InputStreamReader. What encoding type should I save the original file as?

try (BufferedWriter bw = new BufferedWriter(new FileWriter(FILENAME))) {

 String line;
 try {
  try (
   InputStream fis = new FileInputStream("C:\\Users\\marti\\Documents\\Software Projects\\Java Projects\\TwitterAutoBot\\src\\main\\resources\\EH.Txt"); InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8")); BufferedReader br = new BufferedReader(isr);
  ) {
   while ((line = br.readLine()) != null) {
    // Deal with the line

    String content = line.substring(line.lastIndexOf("  ") + 1);
    System.out.println(content);

    bw.write("\n\n" + content);

   }
  }
 } catch (IOException e) {
  e.printStackTrace();
 }

 // bw.close();

} catch (IOException e) {

 e.printStackTrace();

}

@MartinErlic If it is `ANSI`, *why* did you specify **`UTF-8`** in your code? --- If it is [`ANSI`](https://en.wikipedia.org/wiki/ANSI_character_set), which flavor of [extended ANSI](https://en.wikipedia.org/wiki/Extended_ASCII) is it? — Andreas, Dec 18 '17 at 01:02
Because I didn't check the character encoding of the file before hand! — Martin Erlic, Dec 18 '17 at 01:05
However, ANSI is not a recognized encoding type in InputStreamReader. Somebody suggested to use ``US-ASCII`` but this doesn't work either, producing the same weird characters. Neither does saving the file as a UTF-8 because I lose the translations. — Martin Erlic, Dec 18 '17 at 01:08
@MartinErlic What "translations" you talking about? You shouldn't have any problems with UTF-8 for any europen language. Wikipedia also claims that [Windows-1250](https://en.wikipedia.org/wiki/Windows-1250) is suitable for Croatian. — user882813, Dec 18 '17 at 01:30

score 0 · Accepted Answer · answered Dec 18 '17 at 01:27

0

I solved this by encoding with Cp1252 instead of UTF-8 because the file was encoded in ANSI.

answered Dec 18 '17 at 01:27

Martin Erlic

5,467
22
81
153

Greg Kopff · Answer 2 · 2017-12-18T01:20:36.273

You need to use the InputStreamReader/OutputStreamWriter constructors that take a Charset. The constructor that you are using are using the default charset for your platform, which evidently is not what you need.

If you're using Java 8 or above, you might use one of the convenience methods in Files:

You need to ensure that you're reading the input file with the correct charset, as well as writing a file in a charset that supports the characters you're trying to write. UTF-8 is a suitable output file format.

BufferedWriter outputting strange characters when saved to new file

2 Answers2