I have been trying to replace iText (used a long time ago) with PDFBox in one of our projects. What we essentially do is filling out PDF forms, kinda like a mail merge. This all happens in a J2EE backend system, so usually not many fonts are available.
We recently received a PDF, which I could not get filled out. I always got this exception:
Exception in thread "main" java.io.IOException: Could not find font: /MicrosoftSansSerif
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processSetFont(PDDefaultAppearanceString.java:176)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processOperator(PDDefaultAppearanceString.java:129)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processAppearanceStringOperators(PDDefaultAppearanceString.java:105)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.<init>(PDDefaultAppearanceString.java:87)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.getWidgetDefaultAppearanceString(AppearanceGeneratorHelper.java:302)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:197)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:264)
at org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:228)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:219)
at test.Z.main(Z.java:142)
According to PDFBox API: How to change font to handle Cyrillic values in an AcroForm field you should catch this IOException and replace the font with one of the built-in fonts, however, even after doing this I received the same exception:
String da = pdField.getDefaultAppearance();
boolean found = false;
for (COSName fontName : defaultResources.getFontNames()) {
if (da.startsWith("/" + fontName.getName())) {
found = true;
break;
}
}
if (!found) {
System.out.println("font: " + da + " not found, replacing with Helv");
pdField.setDefaultAppearance("/Helv 0 Tf 0 g");
System.out.println("AFTER REPLACE DA: " + pdField.getDefaultAppearance());
pdField.getWidget(0).setAppearance(null);
}
So I was looking deeper and discovered the form field in question had multiple widgets. Iterating over them revealed, they each had a DA set. The API doesn't give you direct access to it, so I checked the COSObject.getString(COSName.DA) on each widget directly and voila, the missing font was sitting in there.
I added the font replacement code for each widget and the form can now be successfully filled out.
What I don't really understand are two things:
- is this the correct approach?
- why is it so difficult? wouldn't it be much simpler to have an option for PDFBox to automatically replace any missing font with a default?
Thank you for your input.
For reference, this is the code needed to fill out this form:
package test;
import java.io.File;
import java.io.IOException;
import java.util.Map.Entry;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
import org.apache.pdfbox.pdmodel.interactive.form.PDTextField;
import org.apache.pdfbox.pdmodel.interactive.form.PDVariableText;
public class Z {
public static void main(String[] args) throws IOException {
PDDocument pdfDocument = PDDocument.load(new File("/home/test/template.pdf"));
System.out.println("page resources:");
for (PDPage page : pdfDocument.getDocumentCatalog().getPages()) {
PDResources resources = page.getResources();
for (COSName key : resources.getFontNames()) {
System.out.println("font key: " + key.getName());
PDFont font = resources.getFont(key);
System.out.println("font name: " + font.getName());
boolean isEmbedded = font.isEmbedded();
System.out.println("embedded: " + isEmbedded);
}
}
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
if (acroForm != null) {
System.out.println();
System.out.println("form DA: " + acroForm.getDefaultAppearance());
System.out.println("form DA (via COSObject): " + acroForm.getCOSObject().getString(COSName.DA));
System.out.println();
System.out.println("form default resources:");
PDResources defaultResources = acroForm.getDefaultResources();
for (COSName fontName : defaultResources.getFontNames()) {
System.out.println("font name: " + fontName.getName());
}
System.out.println();
System.out.println("form default resources font dictionary:");
COSDictionary fontDict = (COSDictionary) defaultResources.getCOSObject().getDictionaryObject(COSName.FONT);
for (Entry<COSName, COSBase> entry : fontDict.entrySet()) {
System.out.println("dict entry: " + entry.getKey().getName());
}
System.out.println();
System.out.println("form fields:");
for (PDField pdFieldX : acroForm.getFields()) {
if (pdFieldX instanceof PDVariableText) {
PDTextField pdField = (PDTextField) pdFieldX;
System.out.println("field mapping name: " + pdField.getMappingName());
System.out.println("field alternate name: " + pdField.getAlternateFieldName());
System.out.println("field partial name: " + pdField.getPartialName());
System.out.println("DA: " + pdField.getDefaultAppearance());
System.out.println("DA (via COSObject): " + pdField.getCOSObject().getString(COSName.DA));
if (pdField.getDefaultAppearance() != null) {
String da = pdField.getDefaultAppearance();
boolean found = false;
for (COSName fontName : defaultResources.getFontNames()) {
if (da.startsWith("/" + fontName.getName())) {
found = true;
break;
}
}
if (!found) {
System.out.println("font: " + da + " not found, replacing with Helv");
pdField.setDefaultAppearance("/Helv 0 Tf 0 g");
System.out.println("AFTER REPLACE DA: " + pdField.getDefaultAppearance());
System.out.println("AFTER REPLACE DA (via COSObject): "
+ pdField.getCOSObject().getString(COSName.DA));
}
}
for (PDAnnotationWidget widget : pdField.getWidgets()) {
System.out.println("WIDGET DA: " + widget.getCOSObject().getString(COSName.DA));
boolean found = false;
String da = widget.getCOSObject().getString(COSName.DA);
for (COSName fontName : defaultResources.getFontNames()) {
if (da.startsWith("/" + fontName.getName())) {
found = true;
break;
}
}
if (!found) {
System.out.println("font: " + da + " not found, replacing with Helv");
widget.getCOSObject().setString(COSName.DA, "/Helv 0 Tf 0 g");
System.out
.println("AFTER REPLACE WIDGET DA: " + widget.getCOSObject().getString(COSName.DA));
}
widget.setAppearance(null);
}
}
System.out.println();
}
System.out.println();
System.out.println();
System.out.println();
for (PDField pdFieldX : acroForm.getFields()) {
if (pdFieldX instanceof PDVariableText) {
PDTextField pdField = (PDTextField) pdFieldX;
System.out.println("setting value on field: " + pdField.getPartialName());
pdField.setValue("a value");
}
}
// acroForm.setNeedAppearances(true);
// acroForm.refreshAppearances();
// acroForm.flatten();
}
// Save and close the filled out form.
pdfDocument.save("/home/test/filled.pdf");
pdfDocument.close();
}
}
and its output:
page resources:
font key: TT0
font name: AAAAAD+PalatinoLinotype-Bold
embedded: true
font key: TT1
font name: AAAAAE+MicrosoftSansSerif
embedded: true
font key: TT2
font name: AAAAAG+MicrosoftSansSerif
embedded: true
font key: TT3
font name: AAAAAI+PalatinoLinotype-Roman
embedded: true
font key: TT4
font name: AAAAAJ+MicrosoftSansSerif
embedded: true
font key: TT5
font name: AAAAAF+MicrosoftSansSerif
embedded: true
font key: TT6
font name: AAAAAL+PalatinoLinotype-Bold
embedded: true
font key: TT7
font name: AAAAAN+PalatinoLinotype-Bold
embedded: true
font key: TT8
font name: AAAAAB+MicrosoftSansSerif
embedded: true
font key: TT9
font name: AAAAAP+ArialMT
embedded: true
form DA: /Helv 0 Tf 0 g
form DA (via COSObject): /Helv 0 Tf 0 g
form default resources:
font name: Helv
font name: HelveticaNeue-ThinItalic
font name: MSReferenceSansSerif
font name: ZaDb
form default resources font dictionary:
dict entry: Helv
dict entry: HelveticaNeue-ThinItalic
dict entry: MSReferenceSansSerif
dict entry: ZaDb
form fields:
field mapping name: null
field alternate name: null
field partial name: field1
DA: /Helv 12 Tf 0.000000 0.000000 0.000000 rg
DA (via COSObject): /Helv 12 Tf 0.000000 0.000000 0.000000 rg
WIDGET DA: /Helv 12 Tf 0.000000 0.000000 0.000000 rg
field mapping name: null
field alternate name: null
field partial name: field2
DA: /Helv 0 Tf 0.000000 0.000000 0.000000 rg
DA (via COSObject): /Helv 0 Tf 0.000000 0.000000 0.000000 rg
WIDGET DA: /Helv 0 Tf 0.000000 0.000000 0.000000 rg
field mapping name: null
field alternate name: null
field partial name: field3
DA: /HelveticaNeue-ThinItalic 0 Tf 0 g
DA (via COSObject): /HelveticaNeue-ThinItalic 0 Tf 0 g
WIDGET DA: /HelveticaNeue-ThinItalic 0 Tf 0 g
field mapping name: null
field alternate name: null
field partial name: field4
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g
field mapping name: null
field alternate name: null
field partial name: field5
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g
field mapping name: null
field alternate name: null
field partial name: field6
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g
field mapping name: null
field alternate name: null
field partial name: field7
DA: /Helv 0 Tf 0 g
DA (via COSObject): null
WIDGET DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg not found, replacing with Helv
AFTER REPLACE WIDGET DA: /Helv 0 Tf 0 g
WIDGET DA: /MSReferenceSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
field mapping name: null
field alternate name: null
field partial name: field8
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g
field mapping name: null
field alternate name: null
field partial name: field9
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g
field mapping name: null
field alternate name: null
field partial name: field10
DA: /Helv 0 Tf 0.000000 0.000000 0.000000 rg
DA (via COSObject): /Helv 0 Tf 0.000000 0.000000 0.000000 rg
WIDGET DA: /Helv 0 Tf 0.000000 0.000000 0.000000 rg
field mapping name: null
field alternate name: null
field partial name: field11
DA: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
DA (via COSObject): /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg
font: /MicrosoftSansSerif 0 Tf 0.000000 0.000000 0.000000 rg not found, replacing with Helv
AFTER REPLACE DA: /Helv 0 Tf 0 g
AFTER REPLACE DA (via COSObject): /Helv 0 Tf 0 g
WIDGET DA: /Helv 0 Tf 0 g
setting value on field: field1
setting value on field: field2
setting value on field: field3
setting value on field: field4
setting value on field: field5
setting value on field: field6
setting value on field: field7
setting value on field: field8
setting value on field: field9
setting value on field: field10
setting value on field: field11
field7 was the issue in my case.