I have some questions about pdf form filling. First let me give you some context : i am trying to make a 100% python pdf form filling service, and for that i am using the pdfrw lib.
Here is my code, it takes as arguments a pdf path and data_dict (json turn into a dict) :
import pdfrw
_ANNOT_KEY = "/Annots"
_ANNOT_FIELD_KEY = "/T"
_ANNOT_VAL_KEY = "/V"
_ANNOT_RECT_KEY = "/Rect"
_SUBTYPE_KEY = "/Subtype"
_WIDGET_SUBTYPE_KEY = "/Widget"
def fill_pdf_with_values(input_pdf_path, data_dict):
template_pdf = pdfrw.PdfReader(input_pdf)
template_pdf.Root.AcroForm.update(
pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject("true"))
)
annotations = template_pdf.pages[0][_ANNOT_KEY]
for page in template_pdf.pages:
for annotation in annotations:
if annotation[_SUBTYPE_KEY] != _WIDGET_SUBTYPE_KEY:
continue
if not annotation[_ANNOT_FIELD_KEY]:
continue
key = annotation[_ANNOT_FIELD_KEY][1:-1]
if key not in data_dict.keys():
continue
if isinstance(data_dict[key], bool):
if data_dict[key]:
# If the value is True then the checkbox will be checked
# "On" is not necessary, by that i mean you can put whatever you want,
# but without this line we cant get the checkbox to works..
# annotation.update(pdfrw.PdfDict(AS=pdfrw.PdfName("On")))
annotation.update(
pdfrw.PdfDict(AP=data_dict[key], AS=pdfrw.PdfName("On"))
)
else:
# If the value is False then we dont want the checkbox to be checked
# annotation.update(pdfrw.PdfDict(AS=pdfrw.PdfName("Off")))
annotation.update(
pdfrw.PdfDict(AP=data_dict[key], AS=pdfrw.PdfName("Off"))
)
continue
annotation.update(pdfrw.PdfDict(AP=data_dict[key], V=data_dict[key]))
output_pdf = pdfrw.PdfWriter()
output_pdf.write("test.pdf", template_pdf)
But i struggle to make it works. Here is my 2 problems :
Depending of the pdf viewer, the data in the text field are not displayed, same for my checkbox. I dont have enough knowledge about PDF to tell the difference between each viewer, what am i supposed to have for it to be displayed in any cases?
I also have a big problem with one particular field => i can edit it when i open the "cleaned" pdf, but when i pass it through my code, nothing is written and the text is no editable ... Also when i print the corresponding annotation, for the the "bugged one", here is what i get (before filling):
annotation = {'/AP': {'/N': (216, 0)}, '/DA': '(/Helv 0 Tf 0 g)', '/DR': {'/Encoding': {'/PDFDocEncoding': {'/Differences': ['24', '/breve', '/caron', '/circumflex', '/dotaccent', '/hungarumlaut', '/ogonek', '/ring', '/tilde', '39', '/quotesingle', '96', '/grave', '128', '/bullet', '/dagger', '/daggerdbl', '/ellipsis', '/emdash', '/endash', '/florin', '/fraction', '/guilsinglleft', '/guilsinglright', '/minus', '/perthousand', '/quotedblbase', '/quotedblleft', '/quotedblright', '/quoteleft', '/quoteright', '/quotesinglbase', '/trademark', '/fi', '/fl', '/Lslash', '/OE', '/Scaron', '/Ydieresis', '/Zcaron', '/dotlessi', '/lslash', '/oe', '/scaron', '/zcaron', '160', '/Euro', '164', '/currency', '166', '/brokenbar', '168', '/dieresis', '/copyright', '/ordfeminine', '172', '/logicalnot', '/.notdef', '/registered', '/macron', '/degree', '/plusminus', '/twosuperior', '/threesuperior', '/acute', '/mu', '183', '/periodcentered', '/cedilla', '/onesuperior', '/ordmasculine', '188', '/onequarter', '/onehalf', '/threequarters', '192', '/Agrave', '/Aacute', '/Acircumflex', '/Atilde', '/Adieresis', '/Aring', '/AE', '/Ccedilla', '/Egrave', '/Eacute', '/Ecircumflex', '/Edieresis', '/Igrave', '/Iacute', '/Icircumflex', '/Idieresis', '/Eth', '/Ntilde', '/Ograve', '/Oacute', '/Ocircumflex', '/Otilde', '/Odieresis', '/multiply', '/Oslash', '/Ugrave', '/Uacute', '/Ucircumflex', '/Udieresis', '/Yacute', '/Thorn', '/germandbls', '/agrave', '/aacute', '/acircumflex', '/atilde', '/adieresis', '/aring', '/ae', '/ccedilla', '/egrave', '/eacute', '/ecircumflex', '/edieresis', '/igrave', '/iacute', '/icircumflex', '/idieresis', '/eth', '/ntilde', '/ograve', '/oacute', '/ocircumflex', '/otilde', '/odieresis', '/divide', '/oslash', '/ugrave', '/uacute', '/ucircumflex', '/udieresis', '/yacute', '/thorn', '/ydieresis'], '/Type': '/Encoding'}}, '/Font': {'/Helv': {'/BaseFont': '/Helvetica', '/Name': '/Helv', '/Subtype': '/Type1', '/Type': '/Font'}}}, '/F': '4', '/FT': '/Tx', '/P': (12, 0), '/Rect': ['453.96', '455.04', '749.16', '463.2'], '/Subtype': '/Widget', '/T': '(Nomdusage)', '/TU': '(Nomdusage)', '/Type': '/Annot'}
when for all the other one that are supposed to be used the same way i get :
annotation = {'/DA': '(/Helv 12 Tf 0 g)', '/F': '4', '/FT': '/Tx', '/MK': {}, '/P': (12, 0), '/Rect': ['129.105', '454.669', '395.032', '463.725'], '/Subtype': '/Widget', '/T': '(Nomdenaissance)', '/TU': '(Nomdenaissance)', '/Type': '/Annot'}
With this, i cant tell if i am doing something wrong... my opinion is that the "clean" pdf has a bad annotation implementation for it to works, i tried a lot of differents things, but it turns out that i cant find the solution on internet.
If needed i can provide the pdf and a data_set.
Thanks for reading and your time! Hope you can help me with this :)