0

I need to read a PDF's page size rather cheaply, so my user can select specific pages (and load them in higher detail).

The only way I see to do this with Magick++ API is using the STL call readImages. This does load in all the pages of the PDF as Magick::Images, and gets quite expensive for large PDF documents (order of 50 pages takes about 15s on my machine.)

I did read a post on ImageMagick's forums that speaks about the ReadOptions class (not documented at time of writing) you can pass to readImages method to read lower density image, but this still takes too long. (About 10s). None of the other options on ReadOptions really make a big difference with regards to speed.

Here is the code I have at the moment:

    std::vector<Magick::Image> PDFImageList;
    Magick::ReadOptions readOptions;
    readOptions.density(Magick::Geometry(2,2));
    readOptions.size(Magick::Geometry(1,1));
    readOptions.depth(8);
    // This call takes too long.
    Magick::readImages(&PDFImageList, m_pathToPDFFile, readOptions);
    int numberOfPages = PDFImageList.size();

I have also tried the Magick::Image.ping() method, and can't find any data that it returns that relates to the page number.

Any other attribute or undocumented ImageMagick++ feature that I can try to get the page count cheaply?

Diederik
  • 5,536
  • 3
  • 44
  • 60
  • Not sure if this is any help or any faster, hence just a comment. Try this at the command-line and see if it helps and if you can adapt it to C++... `identify -format "%s\n" file.pdf` – Mark Setchell May 06 '15 at 13:20
  • There's a `Magick::Image.png` option, but that'll take about the same amount of time for PDFs. I half remember ImageMagick delegating PDF format to [another library](http://pages.cs.wisc.edu/~ghost/). It might be faster to use that API directly. – emcconville May 06 '15 at 15:58
  • @MarkSetchell the command line does seem to work, but seems just as slow. Also I don't want to add the ImageMagick installation as a dependence on the user's PC. @emcconville, I did try the `ping` method, but nothing on that call had info about the total pages in the PDF file. I'll try looking at the GhostScript dependency and see how far that gets me. – Diederik May 06 '15 at 17:22
  • Maybe CPDF... http://community.coherentpdf.com... `cpdf -pages yourFile.pdf` – Mark Setchell May 06 '15 at 17:40

1 Answers1

0

Using another question's answer and Qt's process class, the program now runs the following on the commandline:

gs -q -dNODISPLAY -c "(input.pdf) (r) file runpdfbegin pdfpagecount = quit" 

Which returns page number as last line of standard out. Since the gs executable is a requirement of ImageMagick's PDF reading functionality, I'm happy with this solution. It is also quite fast. (Less than a second for the ~50 page PDF)

Community
  • 1
  • 1
Diederik
  • 5,536
  • 3
  • 44
  • 60