0

I want to get some data from table in pdf file with library camelot-py in my django project. But when I try to run a simple code, it rise Traceback:

Traceback (most recent call last): 
File "<console>", line 1, in <module> 
File "C:\Users\myuser\Desktop\Project\siteproject\.venv\lib\site-packages\camelot\io.py", line 117, in read_pdf 
  **kwargs 
File "C:\Users\myuser\Desktop\Project\siteproject\.venv\lib\site-packages\camelot\handlers.py", line 177, in parse p, suppress_stdout=suppress_stdout, layout_kwargs=layout_kwargs 
File "C:\Users\myuser\Desktop\Project\siteproject\.venv\lib\site-packages\camelot\parsers\lattice.py", line 423, in extract_tables 
  self._generate_table_bbox() 
File "C:\Users\myuser\Desktop\Project\siteproject\.venv\lib\site-packages\camelot\parsers\lattice.py", line 259, in _generate_table_bbox 
  c=self.threshold_constant, 
File "C:\Users\myuser\Desktop\Project\siteproject\.venv\lib\site-packages\camelot\image_processing.py", line 36, in adaptive_threshold 
  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
cv2.error: OpenCV(4.5.3) C:\Users\runneradmin\AppData\Local\Temp\pip-req-build-c2l3r8zm\opencv\modules\imgproc\src\color.cpp:182: 
error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'

My code:

import camelot

pdf_file = 'C:/Users/myuser/Desktop/statement_7022035.pdf'
csv_file = 'C:/Users/myuser/Desktop/ex.csv'

def export_csv(pdf_file, csv_file):
    tables = camelot.read_pdf(pdf_file)
    tables.export(csv_file, f='csv', compress=True)
  • OS: Windows 10
  • Python version: 3.7.5
  • All dependencies for Windows have been installed successfully (Ghostscript and ActiveTcl).
  • Camelot-py have been installed with pip (pip install "camelot-py[base]").
  • My file is text-based PDFs

Please, tell me, where I have done mistake.

Garry
  • 1
  • 1
  • 1
    you'll have to debug that "camelot" package. `img` is empty. make sure the file is actually findable. relative paths are often an issue for newbies. and so are windows paths with forward slashes... – Christoph Rackwitz Sep 05 '21 at 19:30
  • @ChristophRackwitz Rackwitz you were right, debugging was very helpfull. I solved the problem. The right way is to define the argument **flavor="stream"** in the function: `tables = camelot.read_pdf(pdf_file, flavor="stream")` And now it works. – Garry Sep 12 '21 at 20:28

0 Answers0