3

I am using Python 2.4 and PyPdf 1.13 on a Windows platform. I am trying to merge PDF files from a list into one using the following code:

import os
from pyPdf import PdfFileWriter, PdfFileReader

attached=["C:\\test\\T_tech.pdf","C:\\test\\00647165-Backup.pdf"]
output=PdfFileWriter()
maxpage=0

os.chdir("C:\\test")
name= attached[0]
name = os.path.basename(name).split('.')[0]

for nam in attached:
   input= PdfFileReader(file(nam,"rb"))  
   maxpage=input.getNumPages()
   for i in range(0,maxpage):
     output.addPage(input.getPage(i))

outputStream =file("Output-"+name+".pdf","wb")
output.write(outputStream)
outputStream.close()

I am getting the following error when I run this code.

  Traceback (most recent call last):
    File "C:\Python24\pdfmerge.py", line 13, in ?
       input= PdfFileReader(file(nam,"rb"))
    File "C:\Python24\Lib\site-packages\pyPdf\pdf.py", line 374, in __init__
       self.read(stream)
    File "C:\Python24\Lib\site-packages\pyPdf\pdf.py", line 847, in read
       assert False
  AssertionError

Any help is greatly appreciated.

sth
  • 222,467
  • 53
  • 283
  • 367
gaya3
  • 31
  • 2

1 Answers1

1

From the source:

            # bad xref character at startxref.  Let's see if we can find
            # the xref table nearby, as we've observed this error with an
            # off-by-one before.
            stream.seek(-11, 1)
            tmp = stream.read(20)
            xref_loc = tmp.find("xref")
            if xref_loc != -1:
                startxref -= (10 - xref_loc)
                continue
            else:
                # no xref table found at specified location
                assert False
                break

You're hitting that latter "no cross reference table found..." condition. Try patching the source, omitting the assertion and see if it still works.

Brian Cain
  • 14,403
  • 3
  • 50
  • 88
  • This may has something to do with the fact he is using Python 2.4. I cannot reproduce the error with Python 2.7.2 on Windows. – jsalonen Dec 01 '11 at 17:19
  • Hi Brian, Thanks for your try. I tried it by commenting it out but still no change. – gaya3 Dec 01 '11 at 18:02
  • "no change"? That seems really unlikely. At a minimum, if you're somehow hitting another assertion failure it should be on a different line. – Brian Cain Dec 01 '11 at 18:56
  • I found the error is due to one of the files being used is a scanned PDF. Is there any way I can merge scanned PDF? – gaya3 Dec 01 '11 at 19:35
  • I don't see any reason why not. Those are just pages with a huge raster image in them. – Brian Cain Dec 01 '11 at 19:44