I am trying to strip out only the first page of multiple PDF files and combine into one file. (I receive 150 PDF files a day, the first page is the invoice which I need, the following three to 12 pages are just backup which I do not need) So the input is 150 PDF files of varying size and the output I want is 1 PDF file containing only the first page of each of the 150 files.
What I seem to have done is to have merged all the pages EXCEPT the first page (which is the only one I need).
# Get all PDF documents in current directory
import os
pdf_files = []
for filename in os.listdir("."):
if filename.endswith(".pdf"):
pdf_files.append(filename)
pdf_files.sort(key=str.lower)
# Take first page from each PDF
from PyPDF2 import PdfFileWriter, PdfFileReader
for filename in pdf_files:
reader = PdfFileReader(filename)
writer = PdfFileWriter()
for pageNum in range(1, reader.numPages):
page = reader.getPage(pageNum)
writer.addPage(page)
with open("CombinedFirstPages.pdf", "wb") as fp:
writer.write(fp)