-1

I'm having the following code to convert docx file to pdf file, my docx content having text box and Chinese characters.

String myFilePath = "testing.docx";

File docxFile = new File("testing.docx");
WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage.load(docxFile);

Mapper identifierFontMapper = new IdentityPlusMapper();
wordprocessingMLPackage.setFontMapper(identifierFontMapper);

Mapper bestMatchingMapper = new BestMatchingMapper();
wordprocessingMLPackage.setFontMapper(bestMatchingMapper);

Docx4J.toPDF(wordprocessingMLPackage, new FileOutputStream(myFilePath + ".pdf"));

with these codes, I'm able to convert to pdf file, but the problem is that the Chinese characters become #####.

is there any way to solve this problem?

here is my document.xml

JasonPlutext
  • 15,352
  • 4
  • 44
  • 84
codingDummy
  • 122
  • 2
  • 14

1 Answers1

1

Assuming you have docx4j-export-FO on your classpath, so that you are using XSL FO export, you should be able to see what characters are missing glyphs (turn on DEBUG logging for org.docx4j.fonts), and map a suitable font.

See for example https://github.com/plutext/docx4j-export-FO/blob/master/src/samples/docx4j/org/docx4j/samples/ConvertOutPDFviaXSLFO.java#L144

EDIT 29 Sept.

I see:

WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font Calibri is not mapped to a physical font!
WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font SimHei is not mapped to a physical font!
WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font Arial is not mapped to a physical font!
WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font Wingdings is not mapped to a physical font!
WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font 華康中黑體 is not mapped to a physical font!

WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Font "Symbol,normal,700" not found. Substituting with "Symbol,normal,400".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Font "ZapfDingbats,normal,700" not found. Substituting with "ZapfDingbats,normal,400".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Font "Calibri,normal,700" not found. Substituting with "any,normal,700".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "这" (0x8fd9) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "些" (0x4e9b) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "都" (0x90fd) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "只" (0x53ea) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "是" (0x662f) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "测" (0x6d4b) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "试" (0x8bd5) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "而" (0x800c) not available in font "Times-Bold".

Note the Glyph X not available in font Y messages. Therefore I'd need something like:

    Mapper fontMapper = new IdentityPlusMapper();
    wordMLPackage.setFontMapper(fontMapper);

    fontMapper.put("Times-Bold", PhysicalFonts.get(some Chinese font installed in my OS));  
JasonPlutext
  • 15,352
  • 4
  • 44
  • 84
  • as you said docx4j-export-FO is the dependency? for example docx4j-export-fo-3.3.6.jar? inside Dependencies instead of Non-classpath Dependencies? for the Chinese character what i can get from the log is this **[org.docx4j.fonts.fop.util.FopConfigUtil] (default task-75) Document font ????? is not mapped to a physical font!** I believe the ???? is the font. For the text boxes, do you mean that there's no way to show the text boxes in PDF file? – codingDummy Sep 27 '18 at 10:46
  • Please add XML to your question (unzip your docx then go into word/document.xml), so we can see the font specified. Also, please make a second question for your text box issue (and add the XML for that there). – JasonPlutext Sep 27 '18 at 19:57
  • See also https://github.com/plutext/docx4j/blob/master/src/test/java/org/docx4j/fonts/RunFontSelectorChinese2Test.java – JasonPlutext Sep 27 '18 at 19:59
  • i already attached a link to my document.xml, please take a look – codingDummy Sep 28 '18 at 02:41
  • from the latest solution you posted, meaning that i need to hard code in the font i needed for that document? – codingDummy Oct 01 '18 at 02:54
  • 1
    The best solution is to ensure the required font is installed on the computer. If you can't do that, then you have to provide a mapping to a font which is installed (and which contains the glyph) – JasonPlutext Oct 01 '18 at 06:03