How do I solve the camelot-py read_pdf error "EOF marker not found"?

Question

I'm using a text-based pdf, as required, and trying to read the tables off it using the flavor='stream' option. When I run the python script, this error shows up:

File "/path/foo.py", line x, in <module>
File "/path/foo.py", line x, in read_pdf
File "/path/foo.py", line x, in parse
    self._save_page(self.filepath, p, tempdir)
File "/path/foo.py", line x, in _save_page
    infile = PdfFileReader(fileobj, strict=False)
File "/path/foo.py", line x, in __init__
    self.read(stream)
File "/path/foo.py", line x, in read
    raise utils.PdfReadError("EOF marker not found")
PyPDF2.utils.PdfReadError: EOF marker not found

Now, I know this means End-Of-File marker, but I did not generate the pdfs I am trying to parse and it would be very inconvenient if it were a problem with the source, as they make them all the same way.

The line of code I'm using to read is this:

table = cam.read_pdf(fname, flavor='stream')
table

The last line is to display the table in the command line

You can try qpdf or pikepdf (qpdf python wrapper) to fix PDF corrupted file. Let me know if it works... — Stefano Fiorucci - anakin87, Jun 15 '20 at 12:30
Thanks! I managed to use pikepdf to open and save the pdf (result: 2 identical-looking pdfs), now the script runs and no error shows, though now I have to figure out how to make the camelot output show. — pandaero, Jun 17 '20 at 00:10
well, I used the new pdf with excalibur, the web implementation of camelot, and it worked fine, I'm guessing I can use this to bridge my newbie-user journey — pandaero, Jun 17 '20 at 00:27

How do I solve the camelot-py read_pdf error "EOF marker not found"?

0 Answers0