0

I want to split a long PDF document into many parts, e.g. part 1 comprising pages 3-14, part 2 comprising pages 15-19, part 3 comprising pages 20-27, using PyPDF2.

I coded a loop that takes the relevant pages out of the original PDF and saves them as a new doc, for each part. The only problem is that part 2 still includes all the pages from part 1, and part 3 still includes the pages from parts 1 & 2.

I assume I somehow have to 'reset' output = PdfFileWriter(), but if I put it into the while loop I get a long error message.

output = PdfFileWriter()
input = PdfFileReader(open("%s" % pdf, "rb"))

current_row = 2

i =   sheet.cell(row = current_row, column = 4).value 
j =   sheet.cell(row = current_row, column = 5).value
org = sheet.cell(row = current_row, column = 1).value 
n =   sheet.cell(row = current_row, column = 7).value

while i > 0:
    while i <= j:
        p = i-1
        output.addPage(input.getPage(p))
        i += 1
        print(i, p, j)
    print org

    outputStream = file("%s_%s_%s.pdf" % (mysheet, n, org), "wb")
    output.write(outputStream)
    outputStream.close()

    current_row += 1
    i =   sheet.cell(row = current_row, column = 4).value 
    j =   sheet.cell(row = current_row, column = 5).value
    org = sheet.cell(row = current_row, column = 1).value
    n =   sheet.cell(row = current_row, column = 7).value
GEOCHET
  • 21,119
  • 15
  • 74
  • 98
sh_python
  • 17
  • 2
  • resetting `output = PdfFileWriter()` is indeed the solution, what error did you get? – franciscod Jul 13 '15 at 10:44
  • After you close `outputStream`, just assign a new `PdfFileWriter()` to `output`. – martineau Jul 13 '15 at 10:54
  • Including this line solved the problem - thank you! The error message was actually due to the PDF being encrypted. The problem here was that the PDFs that my code generated all started with page 1 of the original document. Resetting output = PdfFileWriter() takes care of that. – sh_python Jul 19 '15 at 21:03

1 Answers1

-1

This is what I tried. I tried it on my pdf files for which I had definite page numbers.

from PyPDF2 import PdfFileWriter, PdfFileReader
pages={'part1':(3,14),'part2':(15,19),'part3':(20,27)}

for name,offset in pages.items():
    op = PdfFileWriter()
    ip = PdfFileReader(open("result.pdf", "rb"))
    for i in range(offset[0]-1,offset[1]):
        op.addPage(ip.getPage(i))
    with file(name+'pdf','wb') as f:
        op.write(f)

I would be really happy to know if I can improve my answer, as i have tried it on my system. and it worked.

Ja8zyjits
  • 1,433
  • 16
  • 31
  • Hi, thanks for the comment - unfortunately it didn't work for me and I'm too much of a Python novice to be able to tell you why not. – sh_python Jul 19 '15 at 21:02