0

I have a question about PoDoFo lib version 0.9.5 in C++ on Ubuntu 17.10. I try to load PDF scanned document by scanner (with non-scanned docs, it works properly), but there is problem with loading a document.

pdf::Document::Document(const std::string &fname) {
    try {
        memDocument.Load(fname.c_str());
        LTRACE << "created pdf::Document from file";
    } catch (const PoDoFo::PdfError &error) {
        LERROR << "Error while loading PDF document(" << fname << "): " << PoDoFo::PdfError::ErrorMessage(error.GetError());
    }
}

memDocument is PoDoFo::PdfMemDocument but I catch this Warning and document isn't loaded at all (still null ref of memDocument):

WARNING: There are more objects (15) in this XRef table than specified in the size key of the trailer directory (8)!
<</ID[<DC15F9B0B1D5684CB68315FC2D09425E<DC15F9B0B1D5684CB68315FC2D09425E>]/Info 7 0 R/Root 9 0 R/Size 8>>

Have somebody had same problem? Or any ideas?

Drise
  • 4,310
  • 5
  • 41
  • 66
  • Why c++14 tag? What does this have to do with c++14? – Drise Mar 16 '18 at 19:48
  • The error message sounds like there is an issue with your pdf file, more objects in the cross reference table than declared in the trailer. Can you share it for analysis? – mkl Mar 17 '18 at 08:43

1 Answers1

1

The warning message

WARNING: There are more objects (15) in this XRef table than specified in the size key of the trailer directory (8)!
<</ID[<DC15F9B0B1D5684CB68315FC2D09425E<DC15F9B0B1D5684CB68315FC2D09425E>]/Info 7 0 R/Root 9 0 R/Size 8>>

indicates that the document in question is not well-formed: Its object cross reference table contains 15 entries while the document trailer declares that there are only 8.

By the way, the trailer quoted in the warning also contains a broken ID: The '>' at the end of the first ID string is missing.

Thus, there appear to be multiple errors in the PDF which keep PoDoFo from loading it at all.

(Adobe Acrobat Reader ignores or repairs many errors in PFDs while loading without complaining, so you might not be aware of all those defects.)

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Yeah, you're right but each document, which I tried, had same problem (but different sizes of course). I changed the ref size manually in one document and it was loaded properly, but I'll have hundreds documents so there is some easy way how to repair it automatically? – Pornosaurus Mar 17 '18 at 14:10
  • If they all have the same damage, have they probably been post-processed by the same software with a bug causing this issue? Then first of all you should stop this. Concerning an automatic repair: if you try and load those documents with PoDoFo, the warning contains pretty clear information what has to be changed. That change should be easy to implement using regular file io. – mkl Mar 17 '18 at 14:38