0

I'm using a text-based pdf, as required, and trying to read the tables off it using the flavor='stream' option. When I run the python script, this error shows up:

File "/path/foo.py", line x, in <module>
File "/path/foo.py", line x, in read_pdf
File "/path/foo.py", line x, in parse
    self._save_page(self.filepath, p, tempdir)
File "/path/foo.py", line x, in _save_page
    infile = PdfFileReader(fileobj, strict=False)
File "/path/foo.py", line x, in __init__
    self.read(stream)
File "/path/foo.py", line x, in read
    raise utils.PdfReadError("EOF marker not found")
PyPDF2.utils.PdfReadError: EOF marker not found

Now, I know this means End-Of-File marker, but I did not generate the pdfs I am trying to parse and it would be very inconvenient if it were a problem with the source, as they make them all the same way.

The line of code I'm using to read is this:

table = cam.read_pdf(fname, flavor='stream')
table

The last line is to display the table in the command line

pandaero
  • 27
  • 5
  • You can try qpdf or pikepdf (qpdf python wrapper) to fix PDF corrupted file. Let me know if it works... – Stefano Fiorucci - anakin87 Jun 15 '20 at 12:30
  • Thanks! I managed to use pikepdf to open and save the pdf (result: 2 identical-looking pdfs), now the script runs and no error shows, though now I have to figure out how to make the camelot output show. – pandaero Jun 17 '20 at 00:10
  • well, I used the new pdf with excalibur, the web implementation of camelot, and it worked fine, I'm guessing I can use this to bridge my newbie-user journey – pandaero Jun 17 '20 at 00:27

0 Answers0