I have a Windows-1252 word document that I want to convert to UTF-8. I need to do this to correctly convert the doc file to a pdf. This is how I currently do it:
Path source = Paths.get("source.doc");
Path temp = Paths.get("temp.doc");
try (BufferedReader sourceReader = new BufferedReader(new InputStreamReader(new FileInputStream(source.toFile()), "windows-1252"));
BufferedWriter tempWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(temp.toFile()), "UTF-8"))) {
String line;
while ((line = sourceReader.readLine()) != null) {
tempWriter.write(line);
}
}
However, when I open the converted file (temp.doc
) in Word, it doesn't display some characters correctly. Ü will become ü for example.
How can I solve this? When I create a new BufferedReader (with UTF-8 encoding) and I read temp
, the characters are shown correctly in the console of my IDE.