Questions tagged [python-camelot]

Camelot is a Python library that makes it easy for anyone to extract tabular data from PDF files.

image

Official web site

Camelot is a Python library that makes it easy for anyone to extract tabular data from PDF files.

Why Camelot?

  • You are in control. Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
  • Bad tables can be discarded based on metrics like accuracy and whitespace, without ever having to manually look at each table.
  • Each table is a pandas DataFrame, which seamlessly integrates into ETL and data analysis workflows.
  • Export to multiple formats, including JSON, Excel and HTML.

See comparison with other PDF table extraction libraries and tools.

197 questions
0
votes
1 answer

Python Library Camelot not reading all tables in one page

I'm using Camelot Python Library to read all tables in a page of pdf document I'm tring to read all tables at page 10 in this pdf I tried to debug plotting the page and I noticed something if I change the flavor: This is with flavor lattice This is…
0
votes
0 answers

Camelot scraping issue for Non English (Tamil) PDF

Python Camelot works a charm when it comes to English. But when it comes to Tamil it's not scraping the words properly. It gives more or less junk characters close to the characters I would like to understand what the issue is and how it captures…
sibi kanagaraj
  • 101
  • 1
  • 10
0
votes
0 answers

Bad encoding using Camelot

I am using Camelot to parse a document. To keep it simple, I am now debugging with the most basic command: all_pages = camelot.read_pdf(str(file_path)) for table_info in all_pages: df = table_info.df print(df) I am applying this to two…
Pablo
  • 1,373
  • 16
  • 36
0
votes
1 answer

camelot in python doesn't recognize all tables

I use camelot in python for table extraction from pdf file. I have code as follows: tables=camelot.read_pdf(r'file_to_path' ,flavor='lattice',pages='1' ,shift_text=[''] ) The…
data_b77
  • 415
  • 6
  • 19
0
votes
1 answer

Camelot not detecting table within table

I have observed that camelot is not detecting nested tables in the sample document I have. In the image attached, I'm getting only one table extracted as whole. Is there anyway using which we can detect the inner tables as well?
Megha Sirisilla
  • 151
  • 2
  • 12
0
votes
0 answers

ModuleNotFoundError: No module named 'camelot' | Works in Jupyter Notebooks but not VSCode

Goal: to get a working version of this tutorial with PDF, via. Visual Studio Code. I am trying to install camelot, via. VSCode, using Poetry, but am having dependency problems. This works in Jupyter Notebooks (bottom of post), but I am attempting to…
0
votes
0 answers

Scraping tables from various PDF-files

I am figuring out how to loop to various multiple-page PDF-files and scrape their tables nicely into Excel-files. However, camelot and tabula are unable to process the PDF-files: # pip install --upgrade camelot-py[cv] tabula-py excalibur-py import…
0
votes
1 answer

How to extract all arrays in a pdf?

Is there a way to extract data from every arrays in a pdf using python? I've tested tabula, camelot, pdfplumber but none can extract everything or correctly. An example: I would like to work on these using matrix, dataframe, ... Should I opt for…
Trokken
  • 37
  • 4
0
votes
1 answer

Camelot-py not detecting tables with two rows

Scraping table data from a .PDF using Camelot-py, and it is not detecting tables with 2/1 rows. PDF I am trying to read: Code used to read tables: abc = camelot.read_pdf('IR-O-U-0436.pdf', pages="all") The output I am getting: From the images,…
0
votes
1 answer

Camelot dependencies - pandas required?

Good morning, I'm in the process of getting Camelot approved for use in my office to help with some projects but need a complete list of dependencies to provide before install. Camelot only lists Tkinter and Ghostscript as dependencies, but…
Xlite
  • 5
  • 1
  • 2
0
votes
1 answer

Camelot in python does not behave as expected

I have two pdf documents, both in same layout with different information. The problem is: I can read one perfectly but the other one the data is unrecognizable. This is an example which I can read perfectly, download here: from_pdf =…
Gizelly
  • 417
  • 2
  • 10
  • 24
0
votes
0 answers

camelot-py: ccv2.error: OpenCV(4.5.3) error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'

I want to get some data from table in pdf file with library camelot-py in my django project. But when I try to run a simple code, it rise Traceback: Traceback (most recent call last): File "", line 1, in File…
Garry
  • 1
  • 1
0
votes
0 answers

How to create executable file with camelot module?

I am trying to create executable file from my script by using auto-py-to-exe. Once I run this exe file, following error occurs: Traceback (most recent call last): File "GUI_PDF_scraper.py", line 5, in ModuleNotFoundError: No module named…
0
votes
1 answer

How to provide table areas as an input in camelot-Python

I am making a python script, where user can provide a pdf and the table areas and then it extracts the table and convert it into csv file. But how to take an input here and add it into the command. import camelot import pandas as pd pdf_line =…
0
votes
0 answers

Some tables are missing while extracting from PDF using Camelot

I tried to extract table data from a Multi page Multi Table PDF using following code import camelot tables = camelot.read_pdf('InputPDF.pdf',flavor='stream',multiple_tables=True,pages='all') tables.export('foo1.csv', f='csv', compress=True) # json,…