0

I have a string variable which contains formatted html text and I have to convert that into .doc file using apache-poi.

I got this solution by using docx4j for .docx file, but client wants the solution by using apache-poi that is html string to .doc and .docx conversion.

So how to convert html text string to .doc and .docx file from formatted html text string using apache-poi in spring boot?

Edit: solutions-

For Doc :

private String getDocHtmlText(byte[] contents)
            throws FileNotFoundException, IOException, ParserConfigurationException, TransformerConfigurationException,
            TransformerFactoryConfigurationError, TransformerException {
        File file = new java.io.File("reportTemplate.doc");
        FileUtils.writeByteArrayToFile(file, contents);
        InputStream input = new FileInputStream(file);
        HWPFDocument wordDocument = new HWPFDocument(input);
        Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
        WordToHtmlConverter converter = new WordToHtmlConverter(doc);
        converter.processDocument(wordDocument);
        ByteArrayOutputStream output = new ByteArrayOutputStream();
        try {
            DOMSource domSource = new DOMSource(converter.getDocument());
            StreamResult streamResult = new StreamResult(output);
            Transformer serializer = TransformerFactory.newInstance().newTransformer();
            serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");
            serializer.setOutputProperty(OutputKeys.METHOD, "html");
            serializer.transform(domSource, streamResult);
        } finally {
            input.close();
            output.close();
            file.delete();
        }
        return output.toString();
    }

For Docx:

private String getDocxHtmlText(byte[] contents) throws IOException, FileNotFoundException {
        File file = new java.io.File("reportTemplate.docx");
        FileUtils.writeByteArrayToFile(file, contents);
        InputStream in = new FileInputStream(file);
        XWPFDocument document = new XWPFDocument(in);
        XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File("word/media")));
        OutputStream out = new ByteArrayOutputStream();
        XHTMLConverter.getInstance().convert(document, out, options);
        in.close();
        out.close();
        file.delete();
        return out.toString();
    }
stackUser
  • 545
  • 3
  • 9
  • 21
  • Possible duplicate of [Convert HTML to docx - Apache POI Java](https://stackoverflow.com/questions/32405933/convert-html-to-docx-apache-poi-java) – g00glen00b Jan 22 '19 at 12:33
  • I thought it is possible because of this link -{https://stackoverflow.com/a/5403453/9024680 } So, how to convert the HTML text string to .doc file as docx4j is only used for .docx files. – stackUser Jan 22 '19 at 12:56
  • Also I have used apache-poi to convert .doc and .docx to html string. – stackUser Jan 22 '19 at 13:04
  • For converting HTML text string to `*.docx` using `apache poi` and `jsoup` for traversing the HTML see my example in https://stackoverflow.com/questions/54268485/how-to-set-define-different-styles-for-the-same-paragraph/54275245#54275245. Creating `*.doc`? Well I never had any progress in even doing the simplest things using the `apache poi` `HWPF` stuff. I have given up that. – Axel Richter Jan 22 '19 at 14:45

0 Answers0