1

I have been trying to replace iText (used a long time ago) with PDFBox in one of our projects. What we essentially do is filling out PDF forms, kinda like a mail merge. This all happens in a J2EE backend system, so usually not many fonts are available.

We recently received a PDF, which I could not get filled out. I always got this exception:

Exception in thread "main" java.io.IOException: Could not find font: /MicrosoftSansSerif
    at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processSetFont(PDDefaultAppearanceString.java:176)
    at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processOperator(PDDefaultAppearanceString.java:129)
    at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processAppearanceStringOperators(PDDefaultAppearanceString.java:105)
    at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.<init>(PDDefaultAppearanceString.java:87)
    at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.getWidgetDefaultAppearanceString(AppearanceGeneratorHelper.java:302)
    at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:197)
    at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:264)
    at org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:228)
    at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:219)
    at test.Z.main(Z.java:142)

According to PDFBox API: How to change font to handle Cyrillic values in an AcroForm field you should catch this IOException and replace the font with one of the built-in fonts, however, even after doing this I received the same exception:

String da = pdField.getDefaultAppearance();
boolean found = false;
for (COSName fontName : defaultResources.getFontNames()) {
    if (da.startsWith("/" + fontName.getName())) {
        found = true;
        break;
    }
}
if (!found) {
    System.out.println("font: " + da + " not found, replacing with Helv");
    pdField.setDefaultAppearance("/Helv 0 Tf 0 g");
    System.out.println("AFTER REPLACE DA: " + pdField.getDefaultAppearance());
    pdField.getWidget(0).setAppearance(null);
}

So I was looking deeper and discovered the form field in question had multiple widgets. Iterating over them revealed, they each had a DA set. The API doesn't give you direct access to it, so I checked the COSObject.getString(COSName.DA) on each widget directly and voila, the missing font was sitting in there.

I added the font replacement code for each widget and the form can now be successfully filled out.

What I don't really understand are two things:

  1. is this the correct approach?
  2. why is it so difficult? wouldn't it be much simpler to have an option for PDFBox to automatically replace any missing font with a default?

Thank you for your input.

For reference, this is the code needed to fill out this form:

package test;

import java.io.File;
import java.io.IOException;
import java.util.Map.Entry;

import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
import org.apache.pdfbox.pdmodel.interactive.form.PDTextField;
import org.apache.pdfbox.pdmodel.interactive.form.PDVariableText;

public class Z {

    public static void main(String[] args) throws IOException {
        PDDocument pdfDocument = PDDocument.load(new File("/home/test/template.pdf"));

        System.out.println("page resources:");
        for (PDPage page : pdfDocument.getDocumentCatalog().getPages()) {
            PDResources resources = page.getResources();
            for (COSName key : resources.getFontNames()) {
                System.out.println("font key: " + key.getName());
                PDFont font = resources.getFont(key);
                System.out.println("font name: " + font.getName());
                boolean isEmbedded = font.isEmbedded();
                System.out.println("embedded: " + isEmbedded);
            }
        }

        PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
        if (acroForm != null) {
            System.out.println();
            System.out.println("form DA: " + acroForm.getDefaultAppearance());
            System.out.println("form DA (via COSObject): " + acroForm.getCOSObject().getString(COSName.DA));
            System.out.println();
            System.out.println("form default resources:");
            PDResources defaultResources = acroForm.getDefaultResources();
            for (COSName fontName : defaultResources.getFontNames()) {
                System.out.println("font name: " + fontName.getName());
            }
            System.out.println();
            System.out.println("form default resources font dictionary:");
            COSDictionary fontDict = (COSDictionary) defaultResources.getCOSObject().getDictionaryObject(COSName.FONT);
            for (Entry<COSName, COSBase> entry : fontDict.entrySet()) {
                System.out.println("dict entry: " + entry.getKey().getName());
            }

            System.out.println();
            System.out.println("form fields:");
            for (PDField pdFieldX : acroForm.getFields()) {
                if (pdFieldX instanceof PDVariableText) {
                    PDTextField pdField = (PDTextField) pdFieldX;
                    System.out.println("field mapping name: " + pdField.getMappingName());
                    System.out.println("field alternate name: " + pdField.getAlternateFieldName());
                    System.out.println("field partial name: " + pdField.getPartialName());
                    System.out.println("DA: " + pdField.getDefaultAppearance());
                    System.out.println("DA (via COSObject): " + pdField.getCOSObject().getString(COSName.DA));
                    if (pdField.getDefaultAppearance() != null) {
                        String da = pdField.getDefaultAppearance();
                        boolean found = false;
                        for (COSName fontName : defaultResources.getFontNames()) {
                            if (da.startsWith("/" + fontName.getName())) {
                                found = true;
                                break;
                            }
                        }
                        if (!found) {
                            System.out.println("font: " + da + " not found, replacing with Helv");
                            pdField.setDefaultAppearance("/Helv 0 Tf 0 g");
                            System.out.println("AFTER REPLACE DA: " + pdField.getDefaultAppearance());
                            System.out.println("AFTER REPLACE DA (via COSObject): "
                                    + pdField.getCOSObject().getString(COSName.DA));
                        }
                    }
                    for (PDAnnotationWidget widget : pdField.getWidgets()) {
                        System.out.println("WIDGET DA: " + widget.getCOSObject().getString(COSName.DA));
                        boolean found = false;
                        String da = widget.getCOSObject().getString(COSName.DA);
                        for (COSName fontName : defaultResources.getFontNames()) {
                            if (da.startsWith("/" + fontName.getName())) {
                                found = true;
                                break;
                            }
                        }
                        if (!found) {
                            System.out.println("font: " + da + " not found, replacing with Helv");
                            widget.getCOSObject().setString(COSName.DA, "/Helv 0 Tf 0 g");
                            System.out
                                    .println("AFTER REPLACE WIDGET DA: " + widget.getCOSObject().getString(COSName.DA));
                        }
                        widget.setAppearance(null);
                    }
                }
                System.out.println();
            }
            System.out.println();
            System.out.println();
            System.out.println();
            for (PDField pdFieldX : acroForm.getFields()) {
                if (pdFieldX instanceof PDVariableText) {
                    PDTextField pdField = (PDTextField) pdFieldX;
                    System.out.println("setting value on field: " + pdField.getPartialName());
                    pdField.setValue("a value");
                }
            }
//          acroForm.setNeedAppearances(true);
//          acroForm.refreshAppearances();
//          acroForm.flatten();
        }

        // Save and close the filled out form.
        pdfDocument.save("/home/test/filled.pdf");
        pdfDocument.close();
    }

}

and its output:

page resources:
font key: TT0
font name: AAAAAD+PalatinoLinotype-Bold
embedded: true
font key: TT1
font name: AAAAAE+MicrosoftSansSerif
embedded: true
font key: TT2
font name: AAAAAG+MicrosoftSansSerif
embedded: true
font key: TT3
font name: AAAAAI+PalatinoLinotype-Roman
embedded: true
font key: TT4
font name: AAAAAJ+MicrosoftSansSerif
embedded: true
font key: TT5
font name: AAAAAF+MicrosoftSansSerif
embedded: true
font key: TT6
font name: AAAAAL+PalatinoLinotype-Bold
embedded: true
font key: TT7
font name: AAAAAN+PalatinoLinotype-Bold
embedded: true
font key: TT8
font name: AAAAAB+MicrosoftSansSerif
embedded: true
font key: TT9
font name: AAAAAP+ArialMT
embedded: true

form DA: /Helv 0 Tf 0 g 
form DA (via COSObject): /Helv 0 Tf 0 g 

form default resources:
font name: Helv
font name: HelveticaNeue-ThinItalic
font name: MSReferenceSansSerif
font name: ZaDb

form default resources font dictionary:
dict entry: Helv
dict entry: HelveticaNeue-ThinItalic
dict entry: MSReferenceSansSerif
dict entry: ZaDb

form fields:
field mapping name: null
field alternate name: null
field partial name: field1
DA: /Helv 12 Tf 0.000000 0.000000 0.000000 rg 
DA (via COSObject): /Helv 12 Tf 0.000000 0.000000 0.000000 rg 
WIDGET DA: /Helv 12 Tf 0.000000 0.000000 0.000000 rg 

field mapping name: null
field alternate name: null
field partial name: field2
DA: /Helv 0 Tf 0.000000 0.000000 0.000000 rg 
DA (via COSObject): /Helv 0 Tf 0.000000 0.000000 0.000000 rg 
WIDGET DA: /Helv 0 Tf 0.000000 0.000000 0.000000 rg 

field mapping name: null
field alternate name: null
field partial name: field3
DA: /HelveticaNeue-ThinItalic 0 Tf 0 g
DA (via COSObject): /HelveticaNeue-ThinItalic 0 Tf 0 g
WIDGET DA: /HelveticaNeue-ThinItalic 0 Tf 0 g

field mapping name: null
field alternate name: null
field partial name: field4
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg  not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g

field mapping name: null
field alternate name: null
field partial name: field5
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg  not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g

field mapping name: null
field alternate name: null
field partial name: field6
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg  not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g

field mapping name: null
field alternate name: null
field partial name: field7
DA: /Helv 0 Tf 0 g 
DA (via COSObject): null
WIDGET DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg  not found, replacing with Helv
AFTER REPLACE WIDGET DA: /Helv 0 Tf 0 g
WIDGET DA: /MSReferenceSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 

field mapping name: null
field alternate name: null
field partial name: field8
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg  not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g

field mapping name: null
field alternate name: null
field partial name: field9
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg  not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g

field mapping name: null
field alternate name: null
field partial name: field10
DA: /Helv 0 Tf 0.000000 0.000000 0.000000 rg 
DA (via COSObject): /Helv 0 Tf 0.000000 0.000000 0.000000 rg 
WIDGET DA: /Helv 0 Tf 0.000000 0.000000 0.000000 rg 

field mapping name: null
field alternate name: null
field partial name: field11
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg 
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg  not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g




setting value on field: field1
setting value on field: field2
setting value on field: field3
setting value on field: field4
setting value on field: field5
setting value on field: field6
setting value on field: field7
setting value on field: field8
setting value on field: field9
setting value on field: field10
setting value on field: field11

field7 was the issue in my case.

  • 1
    Well, strictly speaking the **DA** is an entry of the field objects, not of the widget objects. BUT in PDF versions before 2.0 it is sometimes difficult to distinguish pure widgets from minimal, anonymous fields with merged-in widgets. Maybe this is an issue of your PDF. An issue of your code, though: You iterate over `acroForm.getFields()` which are only the root fields of the form. Consider iterating over all of `acroForm.getFieldTree()` instead. – mkl Nov 30 '21 at 14:30
  • I see, thank you for the clarification. The PDF was apparently generated with PDFpenPro 12 on Mac OS X. According to their web site it is a fairly recent version, maybe the forms they generate can produce this kind of behavior. Thank you for pointing out the getFields() vs getFieldTree() calls, I will check on that to make sure we catch all fields. – Ronny Bremer Dec 01 '21 at 15:59

0 Answers0