PDFBox : How can a PDAcroForm be flattened?

Question

I am using PDFBox library to populate PDF forms but I am not able to flatten them. I have already tried the following solutions:

 PDAcroForm acroForm = docCatalog.getAcroForm();
 PDField field = acroForm.getField( name );
 field.setReadonly(true); //Solution 1
 field.getDictionary().setInt("Ff",1);//Solution 2

But nothing seems to be working. Please suggest a solution for the same.

You talk about *flattening PDF forms* but present sample code which merely tries to set the form fields read only, something completely different than form flattening. So which do you actually want? That been asked, there is no out-of-the-box single-method-call support for form flattening in PDFBox but its low-level API allows implementing form flattening. Beware, though, it is *possible*, not *easy*. I've just google'd around a bit, and the PDFBox form flattening methods I saw mostly are broken or work only in special, easy cases. — mkl, Oct 30 '15 at 09:32
I want to remove the editable fields from the Acroform after they have been populated by the data. I could only find the above mentioned ways to do the same using PDFBox but no luck yet. — aanchal, Nov 02 '15 at 05:42
Ok. The code you found only is about setting the fields read-only, not exactly what you want. As you mention that the form is populated by data, I assume you want that data to still be visible when the fields are removed. Thus, the form contents must be copied into the page contents before removing the form fields. This is not trivial I don't have working code for that either. If I find enough time, I'll look into it later. — mkl, Nov 02 '15 at 11:30
@Maruan's answer indicates that pdfbox 2.0 is going to include an explicit form flattening method. Thus, I won't be trying to handcode it. — mkl, Nov 03 '15 at 05:20

score 2 · Answer 1 · answered Nov 03 '15 at 00:46

2

PDFBOX 2.0.0 has a PDAcroForm.flatten() method

answered Nov 03 '15 at 00:46

Maruan Sahyoun

569
3
6

Ah, that's interesting news. – mkl Nov 03 '15 at 05:18
Has PDFBox 2.0.0 been released officially? Or is it an upcoming version? – aanchal Nov 03 '15 at 06:19
As far as i know pdfbox 2.0 release candidate 1 is available now. – mkl Nov 03 '15 at 07:25
Hhmmm, I just had a look at that `PDAcroForm.flatten()` code. It does not yet look finished. The global flag `isContentStreamWrapped` should have been a flag field, one flag per page, Furthermore it does not create missing appearance streams. – mkl Nov 03 '15 at 10:46
isContentStreamWrapped -> agreed that should be different create missing appearance streams -> that's on purpose. Creating missing appearance streams (and supporting the NeedAppareances flag) is tagged for 2.1. I'd also rather have that done in a separate method and have flatten() expect that the appearance streams do exist. – Maruan Sahyoun Nov 03 '15 at 11:32
While the JavaDocs of `flatten` do explain that *the current appearance* is made *part of the pages content stream*, only people who happen to know about annotation appearance streams and the **NeedAppareances** flag will understand what that means. For casual PDFBox users there should be a warning along the lines of "PDFs do not necessarily have appearances for all their annotations; PDF viewers in such cases generate them. If this may be the case for your PDF, consider creating missing appearances before flattening." – mkl Nov 06 '15 at 09:45
I've added a capability to refresh the field appearance prior to flatten - thx @mkl for your feedback – Maruan Sahyoun Nov 12 '15 at 17:38

Matyas · Answer 2 · 2018-07-10T20:30:36.627

As mentioned by Maruan, PDAcroForm.flatten() method may be used.

Although the fields might need some preprocessing and most importantly the nested field structure had to be traversed and DV and V checked for values.

In our case what worked was:

private static void flattenPDF(String src, String dst) throws IOException {
    PDDocument doc = PDDocument.load(new File(src));

    PDDocumentCatalog catalog = doc.getDocumentCatalog();
    PDAcroForm acroForm = catalog.getAcroForm();
    PDResources resources = new PDResources();
    acroForm.setDefaultResources(resources);

    List<PDField> fields = new ArrayList<>(acroForm.getFields());
    processFields(fields, resources);
    acroForm.flatten();

    doc.save(dst);
    doc.close();
}

private static void processFields(List<PDField> fields, PDResources resources) {
    fields.stream().forEach(f -> {
        f.setReadOnly(true);
        COSDictionary cosObject = f.getCOSObject();
        String value = cosObject.getString(COSName.DV) == null ?
                       cosObject.getString(COSName.V) : cosObject.getString(COSName.DV);
        System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value);
        try {
            f.setValue(value);
        } catch (IOException e) {
            if (e.getMessage().matches("Could not find font: /.*")) {
                String fontName = e.getMessage().replaceAll("^[^/]*/", "");
                System.out.println("Adding fallback font for: " + fontName);
                resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA);
                try {
                    f.setValue(value);
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            } else {
                e.printStackTrace();
            }
        }
        if (f instanceof PDNonTerminalField) {
            processFields(((PDNonTerminalField) f).getChildren(), resources);
        }
    });
}

PDFBox : How can a PDAcroForm be flattened?

2 Answers2

Linked