0

I am trying to read tables from pdf file using camelot.

tables = camelot.read_pdf(file, pages = "1-end")

File "extract_data.py", line 88, in readpdftable tables = camelot.read_pdf(file, pages = "1-end") File "\Myapp\upload\myenv\Lib\site-packages\camelot\io.py", line 113, in read_pdf tables = p.parse( File "\Myapp\upload\myenv\Lib\site-packages\camelot\handlers.py", line 176, in parse t = parser.extract_tables( File "\Myapp\upload\myenv\Lib\site-packages\camelot\parsers\lattice.py", line 421, in extract_tables self.backend.convert(self.filename, self.imagename) File "Myapp\upload\myenv\Lib\site-packages\camelot\backends\ghostscript_backend.py", line 47, in convert ghostscript.Ghostscript(*gs_command) File "Myapp\upload\myenv\Lib\site-packages\ghostscript_init_.py", line 138, in Ghostscript return _Ghostscript(instance, args) File "\Myapp\upload\myenv\Lib\site-packages\ghostscript_init.py", line 69, in init rc = gs.init_with_args(instance, args) File "\Myapp\upload\myenv\Lib\site-packages\ghostscript_gsprint.py", line 262, in init_with_args c_argv = ArgArray(*argv)

TypeError: bytes or integer address expected instead of str instance


I converted file name to bytes:

file = bytes(file,'utf-8')
tables = camelot.read_pdf(file, pages = "1-end")

I get the below error:

File "\Myapp\upload\extract_data.py", line 88, in readpdftable tables = camelot.read_pdf(file, pages = "1-end") File "\Myapp\upload\myenv\Lib\site-packages\camelot\io.py", line 111, in read_pdf p = PDFHandler(filepath, pages=pages, password=password)
File "\Myapp\upload\myenv\Lib\site-packages\camelot\handlers.py", line 41, in init if not filepath.lower().endswith(".pdf"):

TypeError: endswith first arg must be bytes or a tuple of bytes, not str


The same code works fine in jupyter notebook inside anaconda without converting file name to bytes. But when I try to run the same code as .py the above problem arises.

Could anyone please help me? Thanks.

Poongodi
  • 67
  • 1
  • 8

1 Answers1

0

.py ran in different environment. Installing 'ghostscript==0.7' solved the problem!

Poongodi
  • 67
  • 1
  • 8
  • Hi, I'm having the same issue without any luck on fixing it so far. I'm using ghostscript v0.7 in the correct environment, but I still get the error. Could you share the specific module versions you used of camelot, PyPDF, etc? – JasperMW Feb 06 '23 at 09:59
  • 1
    I used PyPDF2==1.26.0, camelot-py==0.10.1, python3_ghostscript-0.5.0-py3-none-any.whl, pdfminer.six==20211012 etc., – Poongodi Feb 07 '23 at 10:50
  • 1
    My requirements.txt document: python-docx==0.8.11 Flask==1.1.2 pandas==1.2.2 pdfminer.six==20211012 PyPDF2==1.26.0 Werkzeug==1.0.1 itsdangerous==2.0.1 jinja2==3.0.3 openpyxl==3.0.9 camelot-py==0.10.1 opencv-python==4.5.5.64 python3_ghostscript-0.5.0-py3-none-any.whl ghostscript==0.7 pymysql==1.0.2 sshtunnel==0.4.0 paramiko==2.8.1 mysql==0.0.3 mysql-connector-python-rf – Poongodi Feb 07 '23 at 10:50