5

I need help with adding Cyrillic value to a field using the PDFBox API. Here is what I have so far:

PDDocument document = PDDocument.load(file);
PDDocumentCatalog dc = document.getDocumentCatalog();
PDAcroForm acroForm = dc.getAcroForm();
PDField naziv = acroForm.getField("naziv");
naziv.setValue("Наслов"); // this part right here
naziv.setValue("Naslov"); // it works like this

It works perfect when my input is in Latin Alphabet. But I need to handle Cyrillic inputs as well. How can I do it?

p.s. this is the exception I get: Caused by: java.lang.IllegalArgumentException: U+043D ('afii10079') is not available in this font Helvetica encoding: WinAnsiEncoding

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
Nenad Vichentikj
  • 111
  • 2
  • 10
  • The CreateSimpleFormWithEmbeddedFont.java example shows how to use a specific font, i.e. the code can be used partly. Do you need this for any PDF or just for one specific field of a speific PDF? Can you share the PDF? – Tilman Hausherr Dec 27 '17 at 17:47
  • Sure. I`ll make the PDF public on my google.drive. here is the link --> https://drive.google.com/open?id=1eI1iRQnrxMA2kEVJPLH9FhQMx2_2kMHj – Nenad Vichentikj Dec 27 '17 at 17:57

1 Answers1

6

The code below adds an appropriate font in the acroform default resource dictionary, and replaces the name in the default appearances. PDFBox recreates the appearance stream of the fields using the new font when you call setValue().

public static void main(String[] args) throws IOException
{
    PDDocument doc = PDDocument.load(new File("ZPe.pdf"));
    PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
    PDResources dr = acroForm.getDefaultResources();

    // Important: the font is Type0 (allows more than 256 glyphs) and NOT SUBSETTED
    PDFont font = PDType0Font.load(doc, new FileInputStream("c:/windows/fonts/arial.ttf"), false);

    COSName fontName = dr.add(font);
    Iterator<PDField> it = acroForm.getFieldIterator();
    while (it.hasNext())
    {
        PDField field = it.next();
        if (field instanceof PDTextField)
        {
            PDTextField textField = (PDTextField) field;
            String da = textField.getDefaultAppearance();

            // replace font name in default appearance string
            Pattern pattern = Pattern.compile("\\/(\\w+)\\s.*");
            Matcher matcher = pattern.matcher(da);
            if (!matcher.find() || matcher.groupCount() < 2)
            {
                // oh-oh
            }
            String oldFontName = matcher.group(1);
            da = da.replaceFirst(oldFontName, fontName.getName());

            textField.setDefaultAppearance(da);
        }
    }
    acroForm.getField("name1").setValue("Наслов");
    doc.save("result.pdf");
    doc.close();
}

Update 4.4.2019: to save some space, it may be useful to remove the appearance before calling setValue:

acroForm.getField("name1").getWidgets().get(0).setAppearance(null);

to check whether there are unused fonts in the AcroForm default resources, see this answer.

Update 7.4.2019: you may experience poor performance if the font is very large (e.g. ArialUni) and many fields are to be set (PDFBOX-4508). In that case, save and reload the file before calling setValue.

To find out whether a font supports an intended text, call PDFont.encode() and check for IllegalArgumentException.

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
  • 1
    For anyone who is googling, this is correct answer to solve the exception "pdfbox is not available in this font's encoding: WinAnsiEncoding" – Putin Oct 15 '21 at 13:35