-1

I am trying to concatenate all the pdf into one pdf thereby using PyPDF2 library. I am using python 2.7 for the same.

My error is :

>>>
 RESTART: C:\Users\Yash gupta\Desktop\first projectt\concatenate\test\New folder\test.py 
['Invoice.pdf', 'Invoice_2.pdf', 'invoice_3.pdf', 'last.pdf']

Traceback (most recent call last):
  File "C:\Users\Yash gupta\Desktop\first projectt\concatenate\test\New folder\test.py", line 17, in <module>
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
  File "C:\Python27\lib\site-packages\PyPDF2\pdf.py", line 1084, in __init__
    self.read(stream)
  File "C:\Python27\lib\site-packages\PyPDF2\pdf.py", line 1689, in read
    stream.seek(-1, 2)
IOError: [Errno 22] Invalid argument

My code is :

import PyPDF2, os
# Get all the PDF filenames.
pdfFiles = []
for filename in os.listdir('.'):
    if filename.endswith('.pdf'):
      pdfFiles.append(filename)
pdfFiles.sort(key=str.lower)
pdfWriter = PyPDF2.PdfFileWriter()

print ( pdfFiles)

# Loop through all the PDF files.
for filename in pdfFiles:
    pdfFileObj = open(filename, 'rb')
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
print ( pdfFileObj )

# Loop through all the pages 
for pageNum in range(0, pdfReader.numPages):
    pageObj = pdfReader.getPage(pageNum)
    pdfWriter.addPage(pageObj)

# Save the resulting PDF to a file.
pdfOutput = open('last.pdf', 'wb')
pdfWriter.write(pdfOutput)
pdfOutput.close()

My pdf has some non-ASCII characters, so i am using 'r' rathen then 'rb'

PS:I am new to Python and all this libraries thing

Yash Gupta
  • 23
  • 2
  • 10

1 Answers1

2

I believe you are looping through collected files incorrectly (Python is indentation-sensitive).

# Loop through all the PDF files.
for filename in pdfFiles:
    pdfFileObj = open(filename, 'rb')
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

    # Loop through all the pages
    for pageNum in range(0, pdfReader.numPages):
        pageObj = pdfReader.getPage(pageNum)
        pdfWriter.addPage(pageObj)

    # Save the resulting PDF to a file.
    pdfOutput = open('last.pdf', 'wb')
    pdfWriter.write(pdfOutput)
    pdfOutput.close()

Also, try to use PdfFileMerger if you want to merge PDF files:

merger = PdfFileMerger(strict=False)

Check out the example code here.

errata
  • 5,695
  • 10
  • 54
  • 99
  • Thanks , did that but now this error is coming File "C:\Python27\lib\site-packages\PyPDF2\generic.py", line 585, in readFromStream % (utils.hexStr(stream.tell()), key)) PdfReadError: Multiple definitions in dictionary at byte 0x2695 for key /Type Can you help ?? – Yash Gupta May 31 '17 at 09:25
  • @YashGupta Updated my answer. – errata May 31 '17 at 09:34