5

I am using pdfbox 2.0.5 to fill out form fields of a PDF document using this code:

        doc = PDDocument.load(inputStream);
        PDDocumentCatalog catalog = doc.getDocumentCatalog();
        PDAcroForm form = catalog.getAcroForm();
        for (PDField field : form.getFieldTree()){
            field.setValue("должен");
        }

I get this error: U+0434 ('afii10069') is not available in this font Times-Roman (generic: TimesNewRomanPSMT) encoding: StandardEncoding with differences

The PDF document itself contains cyrillic text which is displayed fine. I have tried using different fonts. For "Arial Unicode MS" it wants to download a 50MB "Adobe Acrobat Reader DC Font Pack". Is this a requirement for cyrillic characters?

Which font do I have to specify in the text field to handle cyrillic (or asian) characters?

Thanks, Ropo

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
ropo
  • 1,466
  • 2
  • 18
  • 29
  • When I open the PDF on Acrobat Reader I can enter cyrillic characters inthe input field – ropo Mar 20 '17 at 12:59
  • 1
    PDFBox only uses the font specified for the form field while some PDF viewers use additional, fall-back fonts if the specified font misses required characters. Thus, the font to specified has to contain all glyphs you are likely to insert. Unfortunately you did not show how you *create* the form, merely how you *fill it in*. Thus, it is difficult to tell how to better *create* it... – mkl Mar 20 '17 at 15:41
  • Thanks for the reply. Do you have instruction how I should create the PDF so that it works. I am using Acrobat Pro. Are there other/better editors? I can select a font for the form field, but have no idea which font contains cyrillic glyphs. I am using Arial as text font and the characters display correctly in the text but still I cannot insert cyrillic characters into the form field. The final PDF file with the filled out form fields should be viewable/printable by anyone without first installing a font pack. – ropo Mar 21 '17 at 08:16
  • I just tried to create a form field with an appropriate font using my old Adobe Acrobat 9.5 here. Unfortunately the font was embedded using only **WinAnsiEncoding** which does not include Cyrillic glyphs. – mkl Mar 21 '17 at 09:33
  • I found this issue: https://issues.apache.org/jira/browse/PDFBOX-3138 saying "The embedded font used by the field does indeed contain Hebrew glyphs, and a valid "cmap" table which can be used to look up those glyphs. The mentioned character, U+05D7, is indeed is present in the font. The embedded font file is in OpenType format, however the PDF Font dictionary is Type1 and specifies WinAnsiEncoding, which does not include Hebrew characters. So, strictly speaking, the field cannot be filled using any non-ANSI characters and so PDFBox's behaviour is correct." – ropo Mar 21 '17 at 10:15
  • My Acrobat Pro DC allows to specify a font for a field. I can fill out the field in Acrobat Reader with cyrillic characters, just not with PDFBox. I tried to subscribe to the PDFBox mailing list hours ago but get not reply – ropo Mar 21 '17 at 10:18
  • *"I can fill out the field in Acrobat Reader with cyrillic characters, just not with PDFBox."* - As mentioned before and in your quote, PDFBox does what is expected: If the font associated with a form field specifies **WinAnsiEncoding**, then this form field strictly speaking accepts only characters present in **WinAnsiEncoding**. Checking whether some embedded font program actually contains additional glyphs and adapting the encoding or actually adding another font as fallback, is an extra feature, it is not a natural part of form-filling. That been said, it's a worthwhile feature... – mkl Mar 21 '17 at 10:36
  • Tried another approach. Instead of setValue() I called ((PDTextField)field).setDefaultValue(); It does not throw an exception, but unfortunately in the result PDF I still see the previous default value in the document. The new default value only appears in the properties of the field – ropo Mar 21 '17 at 11:03

2 Answers2

4

Adobe handles that by reusing the embedded font file in the {/Ubuntu} font and creates a new font resource from that. Here is a quick hack which can serve as a guide of how to achieve something similar. The code is specific to a sample I've got.

PDDocument doc = PDDocument.load(new File(...));
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDResources formResources = acroForm.getDefaultResources();
PDTrueTypeFont font = (PDTrueTypeFont) formResources.getFont(COSName.getPDFName("Ubuntu"));

// here is the 'magic' to reuse the font as a new font resource
TrueTypeFont ttFont = font.getTrueTypeFont();

PDFont font2 = PDType0Font.load(doc, ttFont, true);
ttFont.close();

formResources.put(COSName.getPDFName("F0"), font2);

PDTextField formField = (PDTextField) acroForm.getField("Text2");
formField.setDefaultAppearance("/F0 0 Tf 0 g");
formField.setValue("öäüинформацию");

doc.save(...);
doc.close();
Maruan Sahyoun
  • 569
  • 3
  • 6
  • Thanks for this great workaround. I know you are very busy these days. I also hope you will receive your present by mail soon – ropo Mar 28 '17 at 09:01
  • I had a similar issue which was resolved by using `PDType0Font.load` instead of `PDTrueTypeFont.loadTTF`. – Aaron Blenkush Oct 23 '17 at 23:53
  • `PDType0Font.load(doc, ttFont, true);` should be corrected to `PDType0Font.load(doc, ttFont, false);` to avoid subsetting. – Tilman Hausherr Aug 18 '20 at 09:09
2

The solution was trivial: form.setNeedAppearances(true);

And then I remove the blue box of the field with: field.setReadOnly(true);

ropo
  • 1,466
  • 2
  • 18
  • 29