0

This is my first python code. The writer passes an error. This seems to occur randomly during the process of looping through the pdf's.

try: except: pass will not work because it will just skip the file with the issue and not produce an output for it.

strict=False does not seem to work for the writer.

The error:

PdfReadWarning: Multiple definitions in dictionary at byte 0x6eb54 for key /PageMode [generic.py:587]
PdfReadWarning: Multiple definitions in dictionary at byte 0x75740 for key /PageMode [generic.py:587]
PdfReadWarning: Multiple definitions in dictionary at byte 0xabc13 for key /PageMode [generic.py:587]
Traceback (most recent call last):
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\kmincey.BCSBLOCAL\.vscode\extensions\ms-python.python-2022.4.0\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\kmincey.BCSBLOCAL\.vscode\extensions\ms-python.python-2022.4.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 444, in main
    run()
  File "c:\Users\kmincey.BCSBLOCAL\.vscode\extensions\ms-python.python-2022.4.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\kmincey.BCSBLOCAL\Desktop\Python_scripts\PDFsealer_V2.py", line 56, in <module>
    output_pdf.write(f)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\generic.py", line 579, in readFromStream
    value = readObject(stream, pdf)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\generic.py", line 68, in readObject
    return readHexStringFromStream(stream)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\generic.py", line 311, in readHexStringFromStream
    raise PdfStreamError("Stream has ended unexpectedly")
PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly

I have read several post regarding the issue of needing to put strict=False in the reader to pass warnings and not errors. https://stackoverflow.com/questions/42570432/pypdf2-stream-has-ended-unexpectedly, https://github.com/mstamy2/PyPDF2/issues/99. This worked in most cases however, the writer now seems to be the problem.

Thanks in advance for any advice.

For loop snippet for reference:

for file in input_pdf:
    output_pdf = PdfFileWriter()
    sg.OneLineProgressMeter('My Meter', i, page_count, 'And now we Wait.....')
    PageObj = PyPDF2.PdfFileReader(open(file, "rb"), strict=False).getPage(0)
    PageObj.scaleTo(11*72, 17*72)
    PageObj.mergePage(Seal_pdf.getPage(0))
    output_pdf.addPage(PageObj)

    output_filename = f"{file}"
    f = open(output_filename, "wb+")
    output_pdf.write(f)
    i = i + 1
    f.close()
cards
  • 3,936
  • 1
  • 7
  • 25
Rand0mdude
  • 27
  • 1
  • 10
  • 1
    no idea, but have you checked the integrity of the pdf? Have you tried to open it with adobe? – cards Apr 01 '22 at 20:40
  • @cards Yes. PDF opens fine and each PDF is printed via the same method from the same computer, by me. I was hoping there was a way to pass the error as a warning like `strict=False` does for the reader. That way it would still proceed without erroring out. The problem my be in the fact that I am making a "PDF sandwich" out of two files. The fix might be that I must find a way to us a watermark or stamp instead to achieve the same goal. – Rand0mdude Apr 01 '22 at 22:46
  • 1
    `PdfFileMerger(strict=False)` could be an idea: avoid `PdfFileWriter` since it doesn't support `strict`... you need to restructure a bit your code, [doc](https://pythonhosted.org/PyPDF2/PdfFileMerger.html) – cards Apr 01 '22 at 23:14
  • @KJ I think you are on to something with the file being in memory. By only changing the output file name line to `output_filename = f"{file[:-4]}_sealed.pdf"` it will run with no errors. The problem appears to be that I am attempting to overwrite the file while it is still in use by the code. I was hoping to actually be able to overwrite the files as they are manipulated. – Rand0mdude Apr 05 '22 at 23:04
  • Just stumbled across this now that I know the overwriting seems to be the issue. Problem is, I am not sure how to incorporate the solution into my code. https://stackoverflow.com/questions/2746758/how-do-i-overwrite-a-file-currently-being-read-by-python – Rand0mdude Apr 05 '22 at 23:12

1 Answers1

0

Due to the helpful input from @cards and @KJ, I was able to discover that the problem was my attempting to overwrite an in use file. The fact that the original was still tied up in memory would corrupt it once reaching the writer. Simply saving the file under a different name and writing some more code to clean up the directory was the solution I went with. Thanks for the assist.

Rand0mdude
  • 27
  • 1
  • 10