3

I know of several tools/libraries that can do this but I want to know if this is possible with just opening up the file as a text file and looking for a keyword.

pyfunc
  • 65,343
  • 15
  • 148
  • 136
Chry Cheng
  • 3,378
  • 5
  • 47
  • 79

3 Answers3

3

have a look at this: http://www.freevbcode.com/ShowCode.asp?ID=8153
Edit: not work, may be too old
Found this:

public static int GetNoOfPagesPDF(string FileName)
        {
            int result = 0;
            FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read);
            StreamReader r = new StreamReader(fs);
            string pdfText = r.ReadToEnd();
            System.Text.RegularExpressions.Regex regx = new Regex(@"/Type\s*/Page[^s]");
            System.Text.RegularExpressions.MatchCollection matches = regx.Matches(pdfText);
            result = matches.Count;
            return result;
        }

Ps: tested! It works.see here source

pinichi
  • 2,199
  • 15
  • 17
  • 1
    FYI - PDF can be written such that you can append changes to the document to the existing file, so if you "delete" pages by appending a new catalog with fewer pages (leaving the old pages in place), this solution will produce incorrect results. – plinth Oct 11 '10 at 17:30
  • The above code didn't work for me, returning more than the correct number of pages. But it made me realize that much of a pdf is text and I was able to find it with Regex (non-global match) `/Type /Pages\nCount ([0-9]+)`. – ErikE Apr 06 '13 at 00:58
1

[Edit: based on the edited question]

It is possible by reading it as text file and some minimal parsing.

If you read the pdf yourself then you will need to do the parsing. Each page in a PDF is represented by a page object.

The following provides an understanding about the pdf specification in short for pages and the link to the pdf spec.

pyfunc
  • 65,343
  • 15
  • 148
  • 136
-1

The xpdf utilities package (called xpdf-utils in debian) includes an application called pdfinfo. It will print out the number of pages in the file, among other data.

http://www.linuxquestions.org/questions/programming-9/how-to-find-pdf-page-count-699113/

Gadolin
  • 2,636
  • 3
  • 28
  • 33