I have this requirement to convert multiple DOCX
files into HTML
format and if possible into RTF
Docx4j
seems to be a good java library for doing this.
Using the HtmlExporterNG2.html method is not necessarily giving out the desired result for me. So I am thinking of modifying the stylesheet that is extracted from the docx file and then using it for this conversion, as all these docx files have varying formatting and hence cannot use a standard stylesheet.
Am I correct in thinking that runtime tinkering with the stylesheet will work? and what are the important thigs I should be aware of?
I am using it as a standalone java application with java version 6.
My query might be a bit vague but am seeking for a right direction at this juncture.
Asked
Active
Viewed 517 times
0

Swift-Tuttle
- 485
- 3
- 14
- 25
-
What is your "desired result"? Do you want to ignore the formatting in the input docx, or override it in certain respects? – JasonPlutext Mar 18 '13 at 21:12
1 Answers
0
@Jason I want to ignore certain formatting in the input docx. As the converted html had some extra spacing or junk characters etc added into it.
As a solution I created a new xslt. For most, it is very similar to the one in the sample but with few minor tweaks. The new xslt now converts the input docx file into a properly formatted(as I need) html for IE8, Mozilla or Chrome.

Swift-Tuttle
- 485
- 3
- 14
- 25