0

I have a function that takes in PDF file path as input and splits it into separate pages as shown below:

import os,time
from pyPdf import PdfFileReader, PdfFileWriter

def split_pages(file_path):
    print("Splitting the PDF")
    temp_path = os.path.join(os.path.abspath(__file__), "temp_"+str(int(time.time())))
    if not os.path.exists(temp_path):
        os.makedirs(temp_path)
    inputpdf = PdfFileReader(open(file_path, "rb"))
    if inputpdf.getIsEncrypted():
        inputpdf.decrypt('')
    for i in xrange(inputpdf.numPages):
        output = PdfFileWriter()
        output.addPage(inputpdf.getPage(i))
        with open(os.path.join(temp_path,'%s.pdf'% i),"wb") as outputStream:
            output.write(outputStream)

It works for small files but the problem is that It only splits for first 0-151 pages when the PDF has more than 152 pages and stops after that. It also sucks out all the memory of the system before I kill it.

Please let me know what I'm doing wrong or where the problem is occurring and how do I correct it?

Vishnu Y S
  • 183
  • 6
  • 18

1 Answers1

0

It seems like the problem is with pyPdf itself. I switched to pyPDF2 and it worked.

Vishnu Y S
  • 183
  • 6
  • 18