Rewritten to look more like a programming question
Okay, so I have done a little more research and it looks like the java package I need to use is docx4j. Unfortunately, my lack of familiarity with the package as well as the underpinnings of the PDF format makes it difficult for me to figure out exactly how to make use of the headers and footers returned SectionWrapper.getHeaderFooterPolicy()
. It's not entirely clear whether the HeaderPart
and FooterPart
objects returned are writeable or how to modify them.
There is this code which offers an example of how to create a header part but it creates a new HeaderPart
and adds it to the document.
I want to find existing header/footer parts and either remove them if possible or empty them out. Ideally they would be entirely gone from the document.
This code is similar and allows you to set the text of a headerpart using setJaxbElement
but so much of this terminology is unfamiliar and I'm concerned the end result will be me creating headers (albeit empty ones) in each document rather than removing them.
Original Question Below
I am dealing with a set of wildly varying MS Word documents. I am compiling them into a single PDF and want to make sure that none of them have headers or footers before doing so.
Ideally, I'd also like to override their default font if it isn't Times New Roman.
Is there any way to do this programmatically or using some sort of batch process?
I will be running this on a Windows server that doesn't currently have Office or Word installed (although I think it might have an install of OpenOffice, and of course it's easy to just add an install as well).
Right now I'm using some version of iText (java) to convert the files to PDF. I know that apparently iText can't do things like removing headers/footers, but since the underlying structure of modern .doc files is XML, I'm wondering if there is an API (or even a XML parsing/editing API or, if all else fails, a RegEx [horrors]) for removing the headers and footers and setting some default styles.