docx4java's SectionWrapper.getHeaderFooterPolicy -- can I use this to remove headers & footers

Question

Rewritten to look more like a programming question

Okay, so I have done a little more research and it looks like the java package I need to use is docx4j. Unfortunately, my lack of familiarity with the package as well as the underpinnings of the PDF format makes it difficult for me to figure out exactly how to make use of the headers and footers returned SectionWrapper.getHeaderFooterPolicy(). It's not entirely clear whether the HeaderPart and FooterPart objects returned are writeable or how to modify them.

There is this code which offers an example of how to create a header part but it creates a new HeaderPart and adds it to the document.

I want to find existing header/footer parts and either remove them if possible or empty them out. Ideally they would be entirely gone from the document.

This code is similar and allows you to set the text of a headerpart using setJaxbElement but so much of this terminology is unfamiliar and I'm concerned the end result will be me creating headers (albeit empty ones) in each document rather than removing them.

Original Question Below

I am dealing with a set of wildly varying MS Word documents. I am compiling them into a single PDF and want to make sure that none of them have headers or footers before doing so.

Ideally, I'd also like to override their default font if it isn't Times New Roman.

Is there any way to do this programmatically or using some sort of batch process?

I will be running this on a Windows server that doesn't currently have Office or Word installed (although I think it might have an install of OpenOffice, and of course it's easy to just add an install as well).

Right now I'm using some version of iText (java) to convert the files to PDF. I know that apparently iText can't do things like removing headers/footers, but since the underlying structure of modern .doc files is XML, I'm wondering if there is an API (or even a XML parsing/editing API or, if all else fails, a RegEx [horrors]) for removing the headers and footers and setting some default styles.

It's an interesting question, but I don't think SO is a good place to ask it. — Steve P., Aug 02 '13 at 19:22
Sorry, I'll reword it to make it clear I meant something more like an API. — Jordan Reiter, Aug 02 '13 at 19:46
Okay, I removed the word "tool" and added some of my initial findings on the docx4j Java package which is *probably* but not definitely what I will need to use. — Jordan Reiter, Aug 02 '13 at 19:49
Part of it is probably just my more generous definition for tool. For example, I would consider `sed` and `awk` as tools but they're extremely useful for use in batch processing and I think discussing their use would be on-topic for SO. I hope that you can help me with my question. — Jordan Reiter, Aug 02 '13 at 19:54

score 2 · Accepted Answer · answered Aug 03 '13 at 00:05

Here is some code hot off the press to do what you want:

public class HeaderFooterRemove  {

public static void main(String[] args) throws Exception {

    // A docx or a dir containing docx files
    String inputpath = System.getProperty("user.dir") + "/testHF.docx";

    StringBuilder sb = new StringBuilder(); 

    File dir = new File(inputpath);

    if (dir.isDirectory()) {

        String[] files = dir.list();

        for (int i = 0; i<files.length; i++  ) {

            if (files[i].endsWith("docx")) {
                sb.append("\n\n" + files[i] + "\n");
                removeHFFromFile(new java.io.File(inputpath + "/" + files[i]));     
            }
        }

    } else if (inputpath.endsWith("docx")) {
        sb.append("\n\n" + inputpath + "\n");
        removeHFFromFile(new java.io.File(inputpath ));     
    }

    System.out.println(sb.toString());

}

public static void removeHFFromFile(File f) throws Exception {


    WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
            .load(f);

    MainDocumentPart mdp = wordMLPackage.getMainDocumentPart();

    // Remove from sectPr
    SectPrFinder finder = new SectPrFinder(mdp);
    new TraversalUtil(mdp.getContent(), finder);
    for (SectPr sectPr : finder.getSectPrList()) {
        sectPr.getEGHdrFtrReferences().clear();
    }

    // Remove rels
    List<Relationship> hfRels = new ArrayList<Relationship>(); 
    for (Relationship rel : mdp.getRelationshipsPart().getRelationships().getRelationship() ) {

        if (rel.getType().equals(Namespaces.HEADER)
                || rel.getType().equals(Namespaces.FOOTER)) {
            hfRels.add(rel);
        }
    }
    for (Relationship rel : hfRels ) {
        mdp.getRelationshipsPart().removeRelationship(rel);
    }

        wordMLPackage.save(f);              
    }
}

The above code relies on SectPrFinder, so copy that somewhere.

I've left the imports out, for brevity. But you can copy those from GitHub

When it comes to making the set of docx into a single PDF, obviously you can either merge them into a single docx, then convert that to PDF, or convert them all to PDF, then merge those PDFs. If you prefer the former approach (for example, because end-users want to be able to edit the package of documents), then you may wish to consider our commercial extension for docx4j, MergeDocx.

Github is down for maintenance at the moment, but as soon as I can get in I'll take a look! Thanks! — Jordan Reiter, Aug 06 '13 at 07:42

score 1 · Answer 2 · edited May 23 '17 at 12:28

1

To remove the header/footer, there is a quite easy solution:

Open the docx as a Zip, and remove the files named header*.xml/footer*.xml (situated in word folder).

Structure of a unzipped docx: https://stackoverflow.com/tags/docx/info

To really remove the link (if you won't do it it will probably corrupted):

You need to edit the document.xml.rels file, and remove all the RelationsShips that include a footer/header. This is a relationship that you should remove:

<Relationship Id="rId13" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/footer" Target="footer2.xml"/>

and more generally all that contain type='footer' or type='header'

edited May 23 '17 at 12:28

Community

1
1

answered Aug 02 '13 at 22:35

edi9999

19,701
13
88
127

You also need to remove the header footer references from each sectPr element – JasonPlutext Aug 03 '13 at 00:06
Where can they be found? – edi9999 Aug 03 '13 at 06:49

docx4java's SectionWrapper.getHeaderFooterPolicy -- can I use this to remove headers & footers

Rewritten to look more like a programming question

Original Question Below

2 Answers2