0

I'm trying to read a PDF file extracted from a zip file in memory to get the tables inside the file. Camelot seems a good way to do it, but I'm getting the following error:

AttributeError: '_io.StringIO' object has no attribute 'lower'

Is there some way to read the file and extract the tables with camelot, or should I use another library?

z = zipfile.ZipFile(self.zip_file)
for file in z.namelist():
    if file.endswith(".pdf"):
        pdf = z.read(file).decode(encoding="latin-1")
        pdf = StringIO(pdf)
        pdf = camelot.read_pdf(pdf, codec='utf-8')
Daniel
  • 51
  • 1
  • 6

1 Answers1

1

camelot.read_pdf(filepath,...) Accepts a file path as the first parameter. It appears to be a bad match for your requirements. Search for another library.

In any case StringIO(pdf), will return the following:

<_io.StringIO object at 0x000002592DD33E20>

For starters, when you read a file from StringIO, do it by calling the read() function

pdf = StringIO(pdf) 
pdf.read()

That bit will indeed return the file bytes themselves. Next think about the encoding that the library will accept.