3

I have a program in Python (using pyPDF) that merges a bunch of different PDF documents. Sometimes, the resulting pdf is fine, except for some blank pages in the middle. When I view these documents with Acrobat Reader, I get an error message saying "insufficient data for image". When I view the documents with FoxIT Reader, I get some blank pages and a munged image.

The only odd thing about the PDF that creates the blank pages is that it seems to be PDF Version 1.4, and PyPdf seems to create files with PDF Version 1.3.

1) Does the version thing sound like the root cause of my problem?

2) Is there a way to get PyPdf to handle this correctly?

Chris Curvey
  • 9,738
  • 10
  • 48
  • 70

3 Answers3

2

This might be related to Windows not actually the .pdf file.

http://support.microsoft.com/kb/2506795

Good luck!

Andy
  • 21
  • 2
2

I had this problem, and was able to figure it out by looking at the original pdf side by side with the PyPDF one in a hex editor.

The problem seems to be that PyPDF actually leaves off a byte - it looks like probably the first byte in each image stream is missing. When I added the bytes to the PyPDF file, the pdf opened up fine without the error.

G H
  • 21
  • 2
1

I suspect that the image XObject stream is Malformed. Without access to a PDF with the problem, all most folks can do is guess.

For example, if the pdf info says the image is 10 pixels wide, 10 pixels high, and 8 bits per pixel, then the stream should uncompress to 100 bytes. If it uncompressed to less than that, I'd expect an error like the one you're seeing.

The is probably a bug in pypdf regarding whatever image format you happen to be using.

IIRC, there's is no scan-line padding in PDF and no concern for word boundaries, though the last bits are padded out to a byte if need be. Confusion there could easily lead to too many bytes, which isn't the problem here.

It could also be a bad color space. If you've got an indexed color image (gif), and they translate it half way to an RGB image, but use the original indexed color bytes, you'd get a stream that might expect n*3 bits per pixel, but only have n bits per pixel.

It's possible that this is an older bug that's been fixed in pypdf. Are you using the current version?

Mark Storer
  • 15,672
  • 3
  • 42
  • 80