How to count the numer of pdf pages in python that has blank pdf page also

Question

I have tried to print the count of pdf document which includes some blank white pdf page using pypdf module. But it avoids the blanks page and print the count of rest of pages. Below is the code.

import sys

import pyPdf

from pyPdf import PdfFileReader, PdfFileWriter

pdf_document = PdfFileReader(file(normalpdfpath,"r"))

normal = pdf_document.getNumPages()
print normal

score 4 · Answer 1 · answered Nov 20 '19 at 13:23

4

step 1:-

pip install pyPDF2

step 2:-

import requests, PyPDF2, io
url = 'sample.pdf' 
response = requests.get(url)
with io.BytesIO(response.content) as open_pdf_file:
  read_pdf = PyPDF2.PdfFileReader(open_pdf_file)
  num_pages = read_pdf.getNumPages()
  print(num_pages)

answered Nov 20 '19 at 13:23

Saleem

109
2
5

is it also possible to extract page dimensions or orientations (landscape/horizontal)? – x89 Jun 16 '21 at 12:59
Yes, you can. check this answer https://stackoverflow.com/questions/46232984/how-to-get-pdf-file-metadata-page-size-using-python – Saleem Aug 19 '21 at 11:26

score 2 · Answer 2 · edited Dec 19 '22 at 02:27

You may try this, which worked for me:

import re
import os

rxcountpages = re.compile(r"/Type\s*/Page([^s]|$)", re.MULTILINE|re.DOTALL)

def count_pages(filename):
    data = file(filename,"rb").read()
    return len(rxcountpages.findall(data))

if __name__=="__main__":
    parent = "/Users/username/"
    os.chdir(parent)
    filename = 'LaTeX20120726.pdf'
    print count_pages(filename)

For Python 3.6+

import re

rxcountpages = re.compile(rb"/Type\s*/Page([^s]|$)", re.MULTILINE|re.DOTALL)

def count_pages(filename: str) -> int:
    with open(filename, "rb") as infile:
        data = infile.read()
    return len(rxcountpages.findall(data))

if __name__=="__main__":
    filename = '/Users/username/LaTeX20120726.pdf'
    print(count_pages(filename))

Regards

It's been a while, but this is still useful. I had to change `data = file(filename,"rb").read()` to `data = open(filename,"rb").read()` -- i.e., `open` instead of `file` -- and `re.compile(r"/Type...` to `re.compile(rb"/Type...` -- i.e., use a binary regular expression. — Dietmar, May 17 '22 at 08:57

score -1 · Answer 3 · answered Aug 04 '22 at 09:42

Just for all your googlers, here is an updated version of this answer and comment that works using built-in packages:

import re

# compile your regex to make it faster
PAGE_COUNT_REGEX = re.compile(
    rb"/Type\s*/Page([^s]|$)", 
    re.MULTILINE|re.DOTALL
)

def get_page_count(floc, regex=PAGE_COUNT_REGEX):
    """Count number of pages in a pdf"""
    with open(floc, "rb") as f:
        return len(regex.findall(f.read()))

get_page_count("path/to/your/file.pdf")

How to count the numer of pdf pages in python that has blank pdf page also

3 Answers3