0

Following SO question Java pdfBox: Fill out pdf form, append it to pddocument, and repeat I had trouble appending a cloned page to a new PDF.

Code from this page seemed really interesting, but didn't work for me.

Actually, the answer doesn't work because this is the same PDField you always modify and add to the list. So the next time you call 'getField' with initial name, it won't find it and you get an NPE. I tried with the same pdfbox version used (1.8.12) in the nice github project, but can't understand how he gets this working.

I had the same issue today trying to append a form on pages with different values in it. I was wondering if the solution was not to duplicate field, but can't succeed to do it properly. I always end with a PDF containing same values for each form.

(I provided a link to the template document for Mkl, but now I removed it because it doesn't belong to me)

Edit: Following Mkl's advices, I figured it out what I was missing, but performances are really bad with duplicating every pages. File size isn't satisfying. Maybe there's a way to optimize this, reusing similar parts in the PDF.

Tcheg
  • 71
  • 6
  • *"the answer doesn't work because this is the same PDField you always modify and add to the list. So the next time you call 'getField' with initial name, it won't find it and you get an NPE."* - please look at the code again, it is *not* the same `PDField`, it is the field with the same, original name from a **newly loaded copy of the original source document**, so if the field is found the first time, it will be found again and again. Thus, considering your false analysis, you probably have not used the original code from that answer but instead a somehow broken version of it. Please show it. – mkl May 11 '18 at 09:37
  • And please also share your template document. Probably the field is not there at all and your analysis of a NPE situation was from the first access to the field and not the second one, and you misinterpreted the situation due to that. – mkl May 11 '18 at 09:41
  • Thanks you for your quick reply ! You're right about the newly loaded copy, I missed that. Because in my code, I tried to reuse the same PDDocument loaded once. This seemed rational because it never changes, but I guess this process alterates it. I try again using information and come back soon. – Tcheg May 11 '18 at 10:08
  • Yes, it works, but as I expected, performance are really bad, and on the top of it, PDF file size grows expensively along the number of pages. I thought I could be able to reuse the same template to spare some Mb with a code based on widget duplication (like here: https://stackoverflow.com/questions/39260500/pdfbox-2-0-2-how-to-copy-pdtextfields-between-documents). If this is actually possible, I can post some code just to understand if my expectations are realistic ? – Tcheg May 11 '18 at 10:20
  • Hmmm. Can you share your template document? Depending on what actually makes your merged file big, it is more or less easy to prevent duplicate data. E.g. if should be easy to have all imported pages use the same page resource objects. A different optimization would help preventing duplicate annotation resources. – mkl May 11 '18 at 10:43
  • I just added the template PDF as a link. About the code, it's pretty much similar as the one you provided in the other SO thread. I have 5 fields in my form, that I want to duplicate with different values in each page. – Tcheg May 11 '18 at 12:19
  • Ok, at 2.5 MB of template information, you're likely to want to prevent duplicate data in the merge... I'll look into that later, probably Monday. – mkl May 11 '18 at 12:29
  • Ok, as you already have found a solution yourself, no need for me looking into that anymore ;) – mkl May 12 '18 at 14:14

1 Answers1

2

Finally I got it working without reloading the template each time. So the resulting file is as I wanted: not too big (4Mb for 164 pages). I think I did 2 mistakes before: one on page creation, and probably one on field duplication. So here is the working code, if someone happens to be stuck on the same problem.

Form creation:

    PDAcroForm finalForm = new PDAcroForm(finalDoc, new COSDictionary());
    finalForm.setDefaultResources(originForm.getDefaultResources())

Page creation:

    PDPage clonedPage = templateDocument.getPage(0);

    COSDictionary clonedDict = new COSDictionary(clonedPage.getCOSObject());

    clonedDict.removeItem(COSName.ANNOTS);
    clonedPage = new PDPage(clonedDict);
    finalDoc.addPage(clonedPage);

Field duplication: (rename field to become unique and set value)

    PDTextField field = (PDTextField) originForm.getField(fieldName);
    PDPage page = finalDoc.getPages().get(nPage);
    PDTextField clonedField = new PDTextField(finalForm);
    List<PDAnnotationWidget> widgetList = new ArrayList<>();
    for (PDAnnotationWidget paw : field.getWidgets()) {
        PDAnnotationWidget newWidget = new PDAnnotationWidget();
        newWidget.getCOSObject().setString(COSName.DA,  paw.getCOSObject().getString(COSName.DA));
        newWidget.setRectangle(paw.getRectangle());
        widgetList.add(newWidget);
    }
    clonedField.setQ(field.getQ()); // To get text centered
    clonedField.setWidgets(widgetList);
    clonedField.setValue(value);
    clonedField.setPartialName(fieldName + cnt++);
    fields.add(clonedField);

    page.getAnnotations().addAll(clonedField.getWidgets());

And at the end of the process:

    finalDoc.getDocumentCatalog().setAcroForm(finalForm);
    finalForm.setFields(fields);
    finalForm.flatten();
Tcheg
  • 71
  • 6
  • I think there may be an issue in that code: you go through the steps of generating your `clonedField` but then you rename the original `field` and add that to AcroForm and page. – mkl May 12 '18 at 06:24
  • Actually you're right ;) I just missed my copy/paste operation. This is because my real code is slightly different with more methods than the one I wrote here. But I just edited it so it's now an accurate reflection of the original. Thanks again for your help ! – Tcheg May 12 '18 at 11:46
  • For an in general even better result, you should also import the default resources of the template document to the **AcroForm** of the result document. – mkl May 12 '18 at 12:32
  • Actually, I create my final AcroForm like this: this.finalForm = new PDAcroForm(finalDoc, new COSDictionary()); finalForm.setDefaultResources(originForm.getDefaultResources()); Is it what you mean about "import the default resources" ? Maybe I could still improve ... But that would be for another day. – Tcheg May 13 '18 at 17:03
  • *"Is it what you mean about "import the default resources" ?"* - yes. And you should add it to the answer because it is important. – mkl May 13 '18 at 21:53
  • Done ! I added a new title – Tcheg May 15 '18 at 18:37
  • As it works for you, you should eventually accept your answer by clicking the tick at its upper left. – mkl May 16 '18 at 04:26