I need to write a java application which can merge docx files. Any suggestions?
-
By "merge," do you mean some simple sort of concatenation? Or something fancier? Is the difficulty the merge part or the docx (rather than doc) part? – Matt Ball Mar 22 '10 at 17:57
-
Merge should give the same result as if we manually open in MS Office first document, press Ctrl+C, then open second document, go to its end and press Ctrl+V. – Roman Mar 22 '10 at 18:07
4 Answers
With POI my solution is:
public static void merge(InputStream src1, InputStream src2, OutputStream dest) throws Exception {
OPCPackage src1Package = OPCPackage.open(src1);
OPCPackage src2Package = OPCPackage.open(src2);
XWPFDocument src1Document = new XWPFDocument(src1Package);
CTBody src1Body = src1Document.getDocument().getBody();
XWPFDocument src2Document = new XWPFDocument(src2Package);
CTBody src2Body = src2Document.getDocument().getBody();
appendBody(src1Body, src2Body);
src1Document.write(dest);
}
private static void appendBody(CTBody src, CTBody append) throws Exception {
XmlOptions optionsOuter = new XmlOptions();
optionsOuter.setSaveOuter();
String appendString = append.xmlText(optionsOuter);
String srcString = src.xmlText();
String prefix = srcString.substring(0,srcString.indexOf(">")+1);
String mainPart = srcString.substring(srcString.indexOf(">")+1,srcString.lastIndexOf("<"));
String sufix = srcString.substring( srcString.lastIndexOf("<") );
String addPart = appendString.substring(appendString.indexOf(">") + 1, appendString.lastIndexOf("<"));
CTBody makeBody = CTBody.Factory.parse(prefix+mainPart+addPart+sufix);
src.set(makeBody);
}
With Docx4j my solution is:
public class MergeDocx {
private static long chunk = 0;
private static final String CONTENT_TYPE = "application/vnd.openxmlformats-officedocument.wordprocessingml.document";
public void mergeDocx(InputStream s1, InputStream s2, OutputStream os) throws Exception {
WordprocessingMLPackage target = WordprocessingMLPackage.load(s1);
insertDocx(target.getMainDocumentPart(), IOUtils.toByteArray(s2));
SaveToZipFile saver = new SaveToZipFile(target);
saver.save(os);
}
private static void insertDocx(MainDocumentPart main, byte[] bytes) throws Exception {
AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/part" + (chunk++) + ".docx"));
afiPart.setContentType(new ContentType(CONTENT_TYPE));
afiPart.setBinaryData(bytes);
Relationship altChunkRel = main.addTargetPart(afiPart);
CTAltChunk chunk = Context.getWmlObjectFactory().createCTAltChunk();
chunk.setId(altChunkRel.getId());
main.addObject(chunk);
}
}

- 880
- 10
- 15
-
thank you for this answer your Poi code works for me, but in my case i need also to merge .doc files so i have to use the org.apache.poi.hwpf.HWPFDocument. So by following your docx code , i want to get the xml format from the .doc file but i didn't find a way to do it. Any idea will be appreciated :) – Amira Mar 21 '14 at 15:08
-
-
1@atott POI code worked for me but if append doc have images, than after merge images are not there, rest all text merged with exact formatting. – Mahaveer Singh Oct 20 '16 at 07:10
-
The solution worked for me only after adding the following dependencies: poi:3.11, poi-ooxml:3.11, ooxml-schemas:1.1. Couldn't make it with newer versions of the poi library. – Cléssio Mendes Jun 09 '17 at 04:36
-
-
@Guillaume You can add a page break in the last paragraph of the first document :) – isah Apr 26 '18 at 18:03
The following Java APIs are available to handle OpenXML MS Word documents with Java:
There was one more, but I don't recall the name anymore.
As to your functional requirement: merging two documents is technically tricky to achieve the result as the enduser would expect. Most API's won't allow that. You'll need to extract the desired information from two documents and then create one new document based on this information yourself.

- 1,082,665
- 372
- 3,610
- 3,555
-
How do you decide which to use? I'm between Apache POI and OpenOffice.org. The second one would require to install open office which I think it would be a hit in the performance, is it true? – Roger Dec 12 '12 at 23:08
-
For more on why it is technically tricky, see http://www.docx4java.org/blog/2010/11/merging-word-documents/ – JasonPlutext Mar 20 '14 at 05:37
-
I guess the best way to decide which to use is to try them with your documents. You can try a commercial tool based on docx4j, at http://webapp.docx4java.org/OnlineDemo/forms/upload_MergeDocx.xhtml – JasonPlutext Mar 20 '14 at 05:53
-
Where did you get the information that apache poi xwpf is dead? As far as i can see there are still releases going on in 2016 https://poi.apache.org/changes.html – Patrik Bego Apr 20 '16 at 14:08
-
1@Patrik: I updated the answer. The project was just stalling at the moment of writing the answer. – BalusC Apr 20 '16 at 14:11
-
@BalusC Ahh ok. But on their home page it is stated " Work is progressing for Word documents (HWPF+XWPF) " under Mission Info. – Patrik Bego Apr 20 '16 at 14:17
Aspose API is the best so far for merging word doc or docx files so far but that is not free or open source, if you need a free and open source tools there are couple of API you can choose from, you can find a review on them here,
http://www.esupu.com/open-source-office-document-java-api-review/

- 2,683
- 3
- 17
- 14
It sure looks like POI can work with docx
files. Are you trying to figure out how to merge them?
How to extract plain text from a DOCX file using the new OOXML support in Apache POI 3.5?