Docx4j - Replacing Word merge field with HTML content

Question

I am trying to replace a Word merge field "test" with an HTML content :

String myText = "<html><body><h1>Hello</h1></body></html>";

using Docx4j.

  String myText = "<html><body><h1>Hello</h1></body></html>";
  try {
        WordprocessingMLPackage docxOut =
                WordprocessingMLPackage.load(new java.io.File("/tmp/template.docx"));
        Map<DataFieldName, String> data = new HashMap<>();
        data.put(new DataFieldName("test"), myText);
        org.docx4j.model.fields.merge.MailMerger.performMerge(docxOut, data, true);
        docxOut.save(new java.io.File("/tmp/newTemplate.docx"));
    } catch (Docx4JException e) {
        LOGGER.error(e.getMessage());
    }

As a result, I have an output (newTemplate.docx) with my merge field replaced by

"<html><body><h1>Hello</h1></body></html>"

without being interpreted as HTML. I tried adding :

docxOut.getContentTypeManager().addDefaultContentType("html", "text/html");

but it still didn't work. I am not even sure if interpreting HTML while replacing a Word merge field can be done using Docx4j or if I'm missing something.

Any help would be welcome.

score 0 · Answer 1 · answered Nov 26 '14 at 11:11

0

You can use the OpenDoPE approach to bind a content control to a Custom XML element which contains escaped XHTML.

answered Nov 26 '14 at 11:11

JasonPlutext

15,352
4
44
84

Does this mean Docx4j alone can't inject html? – Jenna SMITH Nov 26 '14 at 11:30
It means you should use docx4j to do content control data binding if you want automatic conversion of XHTML, not MERGEFIELDs. Alternatively, you can manually convert the XHTML, using the ImportXHTML code. – JasonPlutext Nov 26 '14 at 18:43
I thought about it too but then xhtmlImporterImpl.convert(htmlContent, null)) returns a List – Jenna SMITH Nov 27 '14 at 09:47
1

@JennaSMITH it's pointless having the XHTML importer return a string, because you need to parse XHTML entities into the correct docx structures (i.e. images, paragraphs, runs, etc.) -- the XHTML importer code returns a collection of these very entities, which you can write to the docx file in place of the relevant merge fields. – Ben Nov 27 '14 at 13:10
@Ben The way I know to replace merge fields is MailMerger.performMerge(docxOut, data, true); where data is supposed to be a String argument unless there's another method that can be used for the same purpose and where I can pass a List – Jenna SMITH Nov 27 '14 at 13:59
1

@JennaSMITH Understood. The mailmerge code is designed to replace placeholders with textual content. But as you've found out, that's not enough for imported mark-up: you need custom code to replace the mailmerge field placeholders with the relevant entities. – Ben Nov 28 '14 at 13:12
@Ben Ok, thank you I'll try to see that later. Since I had a deadline to respect I just used docxOut.getMainDocumentPart().getContent().addAll(importer.convert(htmlInput, null)); which accepts a String and added the converted html in the end of the file. – Jenna SMITH Dec 01 '14 at 11:38

Docx4j - Replacing Word merge field with HTML content

1 Answers1