I'm very new to Python. I just started a week ago and am trying to learn some cool stuff around PDF, but really don't know how to go about this.
I have the attached pdf file that I would like to extract all the pages between the keywords "PAGE START" and "PAGE END" and save as single files in 1 folder. For example, create a folder called "test" and save pages 3,4,5 under the filename first. pdf, as well as pages 10, 11, and 12 under the filename second. pdf. The pages identified are all between PAGE START AND PAGE END, not including the PAGE START AND PAGE END pages themselves..
My attempt:
from PyPDF2 import PdfFileReader, PdfFileWriter
import re
reader = PdfFileReader("test.pdf")
StartString = "PAGE START"
EndString = "PAGE END"
for page in range(reader.getNumPages() - 1):
writer = PdfFileWriter()
PageObj = reader.getPage(page)
Text = PageObj.extractText()
ResSearch = re.search(StartString, Text)
if ResSearch is not None:
start = information[page][1]
end = information[page][2]
while start < end:
writer.addPage(reader.getPage(start))
start += 1
output_filename = "{}_{}_page_{}.pdf".format(
information[page][0], information[page][1], information[page][2]
)
with open(output_filename, "wb") as out:
writer.write(out)