0

I wrote simple python code that gets PDF, goes over its pages using PyPDF2 and saves each page as new PDF file. see page save function here:

from PyPDF2 import PdfReader, PdfWriter


def save_pdf_page(file_name, page_index):
    reader = PdfReader(file_name)
    writer = PdfWriter()
    writer.add_page(reader.pages[page_index])
    writer.remove_links()
    with open(f"output_page{page_index}.pdf", "wb") as fh:
        writer.write(fh)

Surprisingly each page is about the same size as the original PDF file. using removeLinks (taken from here) didn't reduce page size

I found similar question here, saying it may be caused because PyPDF output files are uncompressed.

Is there a way using PyPDF or any other python lib to make each page relatively small as expected?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • You are right. I realized I didn't ask the right question. should have asked why split and merge back results much bigger file then original one – Yaron Gertel Jun 15 '22 at 09:32

1 Answers1

0

You are running into this issue: https://github.com/py-pdf/PyPDF2/issues/449

Essentially the are two problems:

  1. Every page might need a resource which is shared, eg font information
  2. PyPDF2 might not realize if some pages don't need it

Remove links might help. Additionally, you might want to follow the docs to reduce file size:

from PyPDF2 import PdfReader, PdfWriter

reader = PdfReader("test.pdf")
writer = PdfWriter()
for page_num in [2, 3]:
    page = reader.pages[page_num]

    # This is CPU intensive! It ZIPs the contents of the page
    page.compress_content_streams()

    writer.add_page(page)


with open("seperate.pdf", "wb") as fh:
    writer.remove_links()
    writer.write(fh)
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • Thanks for the answer, remove_links didn't reduce files size in my case. I realized my main problem is the size of merged PDF file back from splited pages. seems like PyPDF2 doesn't optimize well when merging PDF files into one – Yaron Gertel Jun 15 '22 at 09:28