As title says I'm trying to fill some PDF fields via Python script, here are the things I used for it:
[My working environment properties]
0 - Operating system : Windows 7 32-bit
1 - Python version 3.8.3.
2 - An Editable PDF file, you can get it here Editable_PDF.pdf
3 - Used pdfrw library to read and write the PDF file.
4 - Fields names and values from an external configuration file I named Field_Value.ini here it is content
R's #=R: 1111
C's #=C: 2222
R's Address=3333
C's Address=4444
Date Filed=5555
Docket #=6666
As you see the first column before = is some of fields names of that PDF file and the second are values to be filled.
I got them via pdfforms utility with this command line
pdfforms inspect Editable_PDF.pdf
It will created a .json file (named fields.json) contains information about each field found in that PDF file.
5 - A Python script to read that .ini file and to fill Editable_PDF.pdf fields.
Actually I got succeeded in most parts of this, except one small thing which made me ask question about it because honestly I found no solution for it, and I'm still looking.
The thing is that all fields listed in Field_Value.ini are filled except this one Docket #, no matter I do it just wont be filled, the funny thing is if you try fill it manually via your browser or a PDF editor it is filled. In the start I thought I may putted wrong field name, but no I think it is correct because what ever value I fill I found it convenient name in fields.json the same Docket #.
So the question here what the hell wrong with that Docket # field???, yeah seriously there is something I'm not getting it and I feel it is simple somehow.
I don't want you to write the script for this, I already done one at least for testing. All what you need is to have pdfrw library installed with Python and this is the script:
import pdfrw
PDF_PATH = 'Editable_PDF.pdf'
ANNOT_KEY = '/Annots'
ANNOT_FIELD_KEY = '/T'
ANNOT_VAL_KEY = '/V'
ANNOT_RECT_KEY = '/Rect'
SUBTYPE_KEY = '/Subtype'
WIDGET_SUBTYPE_KEY = '/Widget'
def write_fillable_pdf(input_pdf_path, output_pdf_path, data_dict):
template_pdf = pdfrw.PdfReader(input_pdf_path)
annotations = template_pdf.pages[0][ANNOT_KEY]
for annotation in annotations:
if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
if annotation[ANNOT_FIELD_KEY]:
key = annotation[ANNOT_FIELD_KEY][1:-1]
if key in data_dict.keys():
annotation.update(
pdfrw.PdfDict(V='{}'.format(data_dict[key]))
)
pdfrw.PdfWriter().write(output_pdf_path, template_pdf)
with open("Field_Value.ini", 'r') as file:
data = file.read()
Array = data.split('\n')
for i in range (0, len(Array)):
Field_Value = Array[i].split('=')
Field = Field_Value[0]
Value = Field_Value[1]
print (Field , ' = ' , Value)
if __name__ == '__main__':
data_dict = {
Field: Value
}
write_fillable_pdf(PDF_PATH, PDF_PATH, data_dict)
pdf_template = pdfrw.PdfReader(PDF_PATH)
pdf_template.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))
pdfrw.PdfWriter().write(PDF_PATH, pdf_template)
It might be little ugly but it does what I need. I thought you guys have an idea about it, so any help appreciated, and thank you even for spending time to just read it.
EDIT:
It seems like pdfrw not detecting that field name somehow.
What I did to say that is that I tried to print out detected fields while pdfrw processing the PDF file like this print (key)
or print (annotation[ANNOT_FIELD_KEY][1:-1])
, it list almost all fields names except the one I'm looking for Docked #, so I think that is why it doesn't fill that form.
Anyway I solved this by using another way, I found that pdftk can do that via simple command line using a .fdf file intead of .ini, so yeah at the moment it solved this way.
If you think that the problem with pdfrw can be solved would be good. Any help appreciated.
Smile Regards.