-1

I am trying to bulk stamp a number of pdf files, I found something on github that does something very similar but you have to name each file within the script to match with the actual pdf file for it to work.

https://github.com/iprapas/pythonpdf

def stamp_pdf(input_path, stamp_path, output_path, add_frame=False):
    output = PdfFileWriter()
    create_pdf_stamp(stamp_path, add_frame=add_frame)
    pdf_in = PdfFileReader(open(input_path, 'rb'))
    pdf_stamp = PdfFileReader(open(stamp_path, 'rb'))
    stamp = pdf_stamp.getPage(0)

    for i in xrange(pdf_in.getNumPages()):
        page = pdf_in.getPage(i)
        page.mergePage(stamp)
        output.addPage(page)

    with open(output_path, 'wb') as f:
        output.write(f)

def main():
    stamp_pdf('../input/input1.pdf', '../temp/tmp_stamp.pdf', '../output/stamped1.pdf')
    stamp_pdf('../input/input1.pdf', '../temp/tmp_stamp.pdf', '../output/stamped1_with_frame.pdf', add_frame=True)
    stamp_pdf('../input/input2.pdf', '../temp/tmp_stamp.pdf', '../output/stamped2.pdf')
    stamp_pdf('../input/input2.pdf', '../temp/tmp_stamp.pdf', '../output/stamped2_with_frame.pdf', add_frame=True)

if __name__ == "__main__":

main()

I'm sure there's a way to replace the individual file link so that it points directly to the directory and keeps the file name with it as well. Any pointers to get me started would be much appreciated as I have been trying out all sorts of codes without much luck.

1 Answers1

1

Easily access and manage paths and filenames with pathlib


Example:

from pathlib import Path

p = Path.cwd()
print(p)
>>> WindowsPath('E:/PythonProjects/DataCamp')

pdf_files = list(p.glob('*.pdf'))
print(pdf_files)
>>> [WindowsPath('E:/PythonProjects/DataCamp/aapl.pdf')]

pdf_name = pdf_files[0].name
print(pdf_name)
>>> 'aapl.pdf'
  • Use the glob method to find all the pdf files, including subdirectories, with ** wildcards
    • p.glob('**/*.pdf')
  • Use name to get and easily track the filename

Output files:

out_dir = p / 'output'
print(out_dir)
>>> WindowsPath('E:/PythonProjects/DataCamp/output')

out_pdf = out_dir / f'stamped_{pdf_name}'
print(out_pdf)
>>> WindowsPath('E:/PythonProjects/DataCamp/output/stamped_aapl.pdf')

pythonpdf library might not work with pathlib objects:

  • easily convert pathlib objects back to str
print(type(stamp_path))
>>> pathlib.WindowsPath

print(type(str(stamp_path))
>>> str

create_pdf_stamp(str(stamp_path), add_frame=add_frame)

iterating through .glob:

  • The .glob object is a generator function
p = Path('e:/PythonProjects')
files = p.glob('**/*.pdf')

for file in files:
    print(file)
    ...
    # do other stuff
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • Hi @Trenton_M , thanks so much for your help, I have managed to link it to the right directory and kept the existing file name. Could you point me to the right direction for looping through all the pdf_files in the input folder? – user11284044 Aug 23 '19 at 10:43