0

Facing below issue: can anyone help? please..

Getting the below while trying to extract table data from PDF's..

import camelot

# PDF file to extract tables from
file = input_folder+file_name

tables = camelot.read_pdf(file)

# number of tables extracted
print("Total tables extracted:", tables.n)

# print the first table as Pandas DataFrame
print(tables[0].df)
Error: AttributeError: module 'camelot' has no attribute 'read_pdf'

2 Answers2

1

This error most likely occured because you installed the wrong package.

When you installed the camelot module, you should have used this:

pip install camelot-py[cv]

If not, uninstall the package you installed and use the above command.

DapperDuck
  • 2,728
  • 1
  • 9
  • 21
1

I encounted the same problem and tried many things, including install/uninstall various camelot packages, cloning git, etc. It didn't work for me. I found that the issue was related to CV2. Server (headless) environments do not have GUI packages installed so if you are using Camelot on the server with no GUI you should instal opencv-python-headless first:

pip install opencv-python-headless

and then import in along with camelot.io insteatd of camelot:

import camelot.io as camelot
import cv2
Ed Cher
  • 11
  • 2
  • Thank you! Was trying to run Camelot in RStudio Server (hosted on a headless machine) and was getting cryptic errors, only found your answer after running the same code outside the IDE and getting a `unable to open X server` error. Would have taken me ages to figure that out :-) In my otherwise bare pyenv the `camelot.read_pdf()` also required `pip install ghostscript`. – solarchemist Mar 20 '23 at 00:09