4

I am trying to fill a pdf field with Chinese characters from an fdf or xfdf.

So far I have tried, pdftk, mcpdf, pdfbox and fpdm.

They can all get the characters into the field, but they don't display. When I click on the field to edit, the characters show as expected, but when I click out of the field again they disappear. If I input English they are displayed incorrectly, eg "hello" becomes "IFMMP".

This has all lead me to suspect it's an issue with fonts/character maps, I have tried embedding the full font into the pdf and it made no difference. I have installed the fonts on the machine to no avail.

If I edit the pdf and fill the field in Acrobat it accepts the Chinese characters without a problem and I can view the pdf in a reader. I have tried using pdftk from the command line on the same Windows machine and I am having the same problem.

I need this to work in a Linux environment, and preferably in python or through a command-line script, but really at this point I'd just like to see it work at all! I have attached the sample pdf, fdf, xfdf and the output it is creating, any help would be greatly appreciated as I've run out of ideas. I have been using the command:

"pdftk test_form.pdf fill_form test.xfdf output output.pdf verbose"

https://drive.google.com/folderview?id=0B6ExNaWGFzvnfnJHSC1ZdXhSU2RQVENjYW56UkZyYWJMdWhZTkpQYkZBcUs0Tjhjb0NITVE&usp=sharing

Matthew Wise
  • 55
  • 1
  • 5
  • Have you exported from a correctly filled form, and compared that (X)FDF with what you have? – Max Wyss Mar 18 '15 at 10:42
  • I have exported a correctly filled form to fdf and then tried using that to populate the same form and had the same failure. – Matthew Wise Mar 18 '15 at 11:39
  • OK, I asked that to make sure that the FDF is indeed correct, which it apparently is. If nothing free/OS works, and it justifies some investment, you might look at FDFMerge by Appligent (maybe first contact them about the specifics). – Max Wyss Mar 18 '15 at 20:33
  • Thanks, I'll look into that. Frustrating because I know the text is getting in it's just not getting encoded in the display properly for some odd reason. – Matthew Wise Mar 19 '15 at 14:10

1 Answers1

4

When a form field is filled the fields value is populated and (optional) a visual appearance for the form field is generated reflecting the newly set value. So the reason that you are seeing the value when you click into the form field is that the fields value will be displayed but as long as the field is not activated the fields appearance is used.

If you tried setting the value with PDFBox 1.8 you might try using PDFBox 2.0 as this now has unicode support and the appearance generation is redone.

You also need to ensure that the font you are using in the form is available on the system you are filling your form with. Otherwise with PDFBox 2.0 you might get an error message similar to

Warning: Using fallback font 'TimesNewRomanPSMT' for 'MingLiU'
Exception in thread "main" java.lang.IllegalArgumentException: No glyph for U+5185 in font MingLiU

Which is as MingLiU is not available on the system it has been replaced by TimesNewRomanPSMT which doesn't have the character needed.

As another solution you can also direct the Adobe Reader to calculate the appearance for you when the form is opened using

PDAcroForm form = doc.getDocumentCatalog().getAcroForm();
form.setNeedAppearances(true);

again using PDFBox 2.0

I've created a little sample using PDFBox 2 but creating a form from scratch to test if it can handle the Chinese text

// create a new PDF document
PDDocument doc = new PDDocument();
PDPage page = new PDPage();

// add a new AcroForm and add that to the document
PDAcroForm form = new PDAcroForm(doc);
doc.getDocumentCatalog().setAcroForm(form);

// Add and set the resources and default appearance at the form level
PDFont font = PDType0Font.load(doc, new File("/Library/Fonts/Arial Unicode.ttf"));
PDResources res = new PDResources();
COSName fontName = res.add(font);
form.setDefaultResources(res);
String da = "/" + fontName.getName() + " 12 Tf 0 g";
form.setDefaultAppearance(da);

// add a page to the document 
doc.addPage(page);

// add a form field to the form
PDTextField textBox = new PDTextField(form);
textBox.setPartialName("Chinese");
form.getFields().add(textBox);

// specify the annotation associated with the field
// and add it to the page
PDAnnotationWidget widget = textBox.getWidget();
PDRectangle rect = new PDRectangle(100f,300f,120f,350f);
widget.setRectangle(rect);
page.getAnnotations().add(widget);

// set the field value
textBox.setValue("木兰辞");
doc.save("ChineseOut.pdf");

which works fine. I also tested with the font you are using unfortunately this had an error as MingLiU is a TrueType collection which PDFBox can not handle at that point in time.

Maruan Sahyoun
  • 569
  • 3
  • 6
  • 1
    Really appreciate your help, you have saved me a lot of searching. I will look into this, your comment about appearances led me to discover the need_appearances flag in pdftk which still doesn't populate the form properly for viewing in Linux, but does populate it so I can load it in Adobe Reader in Windows. Also doesn't flatten it properly, I am hopeful that there is a viable way to flatten the pdf using PDFBox or another program. Thanks again. – Matthew Wise Mar 21 '15 at 10:50
  • 1
    flattening the form is a different issue as this would mean to remove the form fields and the widgets from the document and make the fields appearance part of the page content.That has been answered here http://stackoverflow.com/questions/14454387/pdfbox-how-to-flatten-a-pdf-form – Maruan Sahyoun Mar 21 '15 at 12:21
  • Great, currently I'm using the pdfbox-app from https://repository.jboss.org/nexus/content/groups/public/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/ as it contains the relevant dependencies. I'm trying to use the script you have provided as a proof of concept but I'm getting a NullPointerException when it gets to textBox.setValue: at org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearance(PDVariableText.java:86) is there another jar you would recommend? It's been some time since I wrote Java code so apologies if I've missed something obvious. – Matthew Wise Mar 21 '15 at 12:54
  • please use the latests snapshot from https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/2.0.0-SNAPSHOT/ as this issues has only been fixed recently. The other possibility is to set the default appearance on the field instead of the form using textBox.setDefaultAppearance() – Maruan Sahyoun Mar 21 '15 at 13:56
  • yes, I tried that first but how can I get matching fontbox and xmpbox versions for the snapshot? Should I build my own from trunk? – Matthew Wise Mar 21 '15 at 13:59
  • look at http://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/ from there you can either download the pdfbox-app (convenient as it has all dependencies) or the individual libs – Maruan Sahyoun Mar 21 '15 at 14:05
  • Great, that worked. Now I'm having the issue that importFDF is not populating my form. If I manually set the fields using form.getField and setValue it works but when I'm using the same form and xfdf I gave as examples above and the code: FDFDocument fdf = FDFDocument.loadXFDF(new File("./input.xfdf")); form.importFDF(fdf); the fields remain empty. – Matthew Wise Mar 21 '15 at 15:53
  • the issue has been fixed and a new snapshot version is available for download – Maruan Sahyoun Mar 22 '15 at 17:09
  • with the latest snapshot for PDFBox 2.0.0 there is now also support for TrueType collections. – Maruan Sahyoun Apr 09 '15 at 07:37