0

I'm trying to process some data with apache POI. I'm generating a word file, but since I'm german, I'm using Umlaute (ä,ö,ü), which aren't displayed correctly in the generated word file. I think, a word file doesn't have an encoding set by default, but I'm not quite sure.

Is there a way to set an encoding style to the XWPFDocument object?

Help would be appreciated! Thank you very much!

Edit:

Sample code:

XWPFDocument doc = new XWPFDocument();
XWPFTable table = doc.createTable();
XWPFTableRow row = table.getRow(0);
XWPFTableCell cell = row.getCell(0);
cell.setText("äüöß");
FileOutputStream fos = new FileOutputStream(new File("OutputFile.docx"));
doc.write(fos);
doc.close();

Output is as follows:

https://i.stack.imgur.com/pLEOd.png

System.getProperty("file.encoding"); returns windows-1252. Setting the Property to UTF-8 didn't help.

Cindy Meister
  • 25,071
  • 21
  • 34
  • 43
habnix
  • 1
  • 1
  • 1
    Writing characters with umlauts (and other non-ASCII characters) to a Word (docx) document using Apache POI should be straightforward, with no need for any special handling for encodings. Can you edit your question, and add in a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example)? If you provide the relevant formatted code, that will help. – andrewJames Mar 25 '20 at 23:58
  • Sadly it isn't as straightforward as one would think. I've just edited the initial post with the code and an image of the output. Thank you for your answer! – habnix Mar 26 '20 at 01:32
  • Try: `OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8")` and then `doc.write(osw)` – Abra Mar 26 '20 at 01:43
  • 1
    What value does this return: `System.getProperty("file.encoding")` – Abra Mar 26 '20 at 01:58
  • Hi Abra, I tried using a writer before, unfortunately doc.write() only accepts an outputstream as parameter. `System.getProperty("file.enconding")` returns `windows-1252` – habnix Mar 26 '20 at 02:05
  • 2
    @Habnix: have you verified that your Java file is compiled with the correct encoding? It's possible that your string constant doesn't contain the text you intended it to. Try printing `"äüöß".length()`. If it prints 8, then that is probably the problem. – Joachim Sauer Mar 26 '20 at 09:51
  • @Abre: both of these are pointless. Apache POI doesn't write text files, so passing it a `Writer` won't work (and would be pointless if it did) and I'm pretty sure that `file.encoding` is also not involved in POI writing those files. – Joachim Sauer Mar 26 '20 at 09:52
  • 1
    Cannot reproducing. Your simple example works for me using `apache poi 4.1.2`and `Java 12` on linux as well as Windows and generates a `Word` document having all the characters properly encoded in the table cell. What encoding is your `*.java` file stored on? What encoding is `javac` using while compiling? Do they match? – Axel Richter Mar 26 '20 at 09:53
  • @JoachimSauer: Yes, it actually prints 8. I still don't have an idea how to fix it. – habnix Mar 26 '20 at 11:03
  • Okay, so I've changed the Global and Project encodiong to UTF-8 via Settings->Editor->File encodings. It still printed 8 as `string.length()`. – habnix Mar 26 '20 at 11:13
  • 1
    @habrix: that means your Java source file was compiled to .class files with the wrong encoding. See [this question](https://stackoverflow.com/questions/1726174/how-to-compile-a-java-source-file-which-is-encoded-as-utf-8) on how to fix that. – Joachim Sauer Mar 26 '20 at 11:13
  • @habnix: I don't know what IDE you're talking about. This is not about the editor, but about the compiler. – Joachim Sauer Mar 26 '20 at 11:13
  • I'm using IntelliJ, haven't been able to solve the issue using the link either, even with using the project's path and setting the encoding to `UTF-8`. – habnix Mar 26 '20 at 13:12

1 Answers1

0

Make sure that you have set your file encodings to UTF-8 in IntelliJ. If you didn't have UTF-8 set in IntelliJ before, you might have to recreate the .java file, which contains the code you show in your question (a restart of the IDE would probably also be a good idea), as changed encoding settings are effective for new files only.

And if you are using gradle, add this line to your build.gradle:

[compileJava, compileTestJava]*.options*.encoding = 'UTF-8'

https://intellij-support.jetbrains.com/hc/en-us/community/posts/360006974119-File-encoding-ignored-by-compiler

Mario Varchmin
  • 3,704
  • 4
  • 18
  • 33