1

I'm trying to convert xhtml file to docx and find following example code:

wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert(new File(inputfilepath), null, wordMLPackage) );
wordMLPackage.save(new java.io.File(System.getProperty("user.dir") + "/html_output.docx") );

It seems all data will be load in memory, is that right? If the xhtml(include image) is a big file, it may cause OOM. Anyone know how to prevent this?

Many Thanks!

  • Yes, it will load all the data into memory. Is your input exceptionally large, or your environment memory constrained? – JasonPlutext Mar 28 '13 at 01:33
  • Thanks for your response. I'm developing a migration tool, convert old webdoc(html) to docx, actually, I don't know whether there is any large webdoc in product. during testing, I find if input include image, the memory cost will be very large. for example: 3k html text, link 2 images, the total image size is 70.8K, during converting the highest memory cost is 650M. – simpletosimple Mar 28 '13 at 02:21
  • just do another test, increased JVM to: -Xms256m -Xmx1024m. html 4k, link 10 image files, total image size:423K, when do converting, throw OOM. – simpletosimple Mar 28 '13 at 02:38
  • Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.awt.image.DataBufferInt.(DataBufferInt.java:75) ......at org.docx4j.org.xhtmlrenderer.util.ImageUtil.createCompatibleBufferedImage(ImageUtil.java:111) at org.docx4j.org.xhtmlrenderer.util.ImageUtil.convertToBufferedImage(ImageUtil.java:240) ...... at org.docx4j.convert.in.xhtml.XHTMLImporter.convert(XHTMLImporter.java:312) – simpletosimple Mar 28 '13 at 02:39
  • I changed html content, only link 4 images, still appeared OOM. I found one image cost almost 250M memory, 3 images can work, but memory cost is almost 800M. – simpletosimple Mar 28 '13 at 02:53
  • 2
    I find the root cause, it caused by height and width setting for , my html code:, after removed height="" and width="", it works, and very powerful, execute fast and little memory cost. :) – simpletosimple Mar 28 '13 at 03:11

0 Answers0