Questions tagged [python-camelot]

Camelot is a Python library that makes it easy for anyone to extract tabular data from PDF files.

image

Official web site

Camelot is a Python library that makes it easy for anyone to extract tabular data from PDF files.

Why Camelot?

  • You are in control. Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
  • Bad tables can be discarded based on metrics like accuracy and whitespace, without ever having to manually look at each table.
  • Each table is a pandas DataFrame, which seamlessly integrates into ETL and data analysis workflows.
  • Export to multiple formats, including JSON, Excel and HTML.

See comparison with other PDF table extraction libraries and tools.

197 questions
2
votes
4 answers

ModuleNotFoundError: No module named 'camelot'

I want to extract tables from pdf and for that I used Camelot. But I'm getting this error whenever I try to import it: import camelot Traceback (most recent call last): File "", line 1, in
2
votes
0 answers

GhostscriptError: -100 while using camelot-py

Windows 10, Python 3.7.4, Ghostscript 9.5.2 After having this problem I've installed camelot from the repository. Havn't got this error again, but got this new one: File "data_extract2.py", line 17, in get_data_tables return…
reefette
  • 21
  • 1
2
votes
1 answer

Camelot-py not detecting two lines of text in one row

Scraping table data from a .PDF using Camelot-py, and it is not picking up stacked lines of text (refer to rows 9 and 10 below) Rows 9 and 10 are void of text for account.…
Logan McNulty
  • 73
  • 1
  • 7
2
votes
2 answers

Getting a 'CalledProcessError.... returned non-zero exit status 1' on running tabula.read_pdf() function on python 3.6

I have tried all possible options. Please help I am getting the following error while running the read_pdf() of tabula in python. The error is CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar',…
Sounak Banerjee
  • 99
  • 1
  • 10
2
votes
2 answers

How to iterate through a list of Data frames and drop all data if a specific string isnt found

I am using the python library Camelot to parse through multiple PDFs and pull out all tables within those PDF files. The first line of code yields back all of the tables that were scraped from the pdf in list format. I am looking for one table in…
Josiah Hulsey
  • 499
  • 1
  • 7
  • 26
2
votes
2 answers

Python-Camelot extracting empty tables

I am using Camelot to extract multiple sections of a PDF by the following command. cgl_section = camelot.read_pdf(filename, flavor='stream', table_areas=['35,490,155,483', '53,480,110,470', '117,480,155,470', …
A.A. F
  • 349
  • 5
  • 16
1
vote
0 answers

camelot.ext.ghostscript._gsprint.GhostscriptError: -100 while using lattice flavour in camelot

When i try to use lattice flavor in camelot.read_pdf it throws the error camelot.ext.ghostscript._gsprint.GhostscriptError: -100 Stream flavor works just fine Here is my code: import camelot tables = camelot.read_pdf("test.pdf", flavor="lattice",…
Anirudh
  • 23
  • 2
1
vote
1 answer

Need to install Ghostscript to Mac PATH

Getting an error with Camelot, "Ghostscript is not installed". Tried everything, the issue is that it is not added to path, gs IS installed on the machine. Failing the following check from Camelot install page…
Dmul
  • 11
  • 2
1
vote
1 answer

How to use Camelot-py to split rows when text exist on a specific column

I am trying to extract table information from pdf using Camelot-py library. Initially using stream function like this: import camelot tables = camelot.read_pdf('sample.pdf', flavor='stream', pages='1', columns=['110,400'], split_text=True,…
1
vote
0 answers

Camelot pdf extraction has an issue while copying texts among span cells

I am extracting data from PDFs using camelot and am faced with the following issue on 3. page of this datasheet. The problematic table is shown below: The issue is inconsistency during the copying content of span cells. As you can see on the…
Said Akyuz
  • 180
  • 1
  • 1
  • 11
1
vote
2 answers

Find the row and column index of the particular value in Dataframe

I have to find the row and column index of the particular value from the dataframe. I have the code to find the row index based on the column name. But not sure how to find both row and column indexes. Current…
Pravin
  • 241
  • 2
  • 14
1
vote
1 answer

Camelot - detecting hyperlinks within table

I am using Camelot to extract tables from PDF files. While this works very well, it extracts the text only, it does not extract the hyperlinks that are embedded in the tables. Is there a way of using Camelot or a similar package to extract table…
Amy D
  • 55
  • 1
  • 6
1
vote
1 answer

Extract table from Image PDF

The job is to extract the table from the image pdf. I tried using Camelot/ tabula but nothing worked. Any Suggestions on how can I extract the tables? Attached the image of the table here : Camelot/tabula none of them detects the table. Attached…
Pravin
  • 241
  • 2
  • 14
1
vote
0 answers

How to skip image-based pages in camelot?

I'm running a for loop for multiple pdfs with multiple pages to extract multiple tables. Problem is when I run the for loop for multiple pdfs if there are any pdfs that contain image-based format at page 1 or 2 and tables start from page 2 or 3…
redox741
  • 21
  • 5
1
vote
1 answer

Failed to install cryptography in Android Studio using Chaquopy

I want to use camelot-py in Android Studio using Chaquopy. But during installation of camelot-py, Gradle is unable to install cryptography Chaquopy version : 12.0.1 Android Gradle Plugin Version : 7.2.2 minSDK : 21 build.gradle (top-level): plugins…
1 2
3
13 14