Questions tagged [python-camelot]

Camelot is a Python library that makes it easy for anyone to extract tabular data from PDF files.

image

Official web site

Camelot is a Python library that makes it easy for anyone to extract tabular data from PDF files.

Why Camelot?

  • You are in control. Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
  • Bad tables can be discarded based on metrics like accuracy and whitespace, without ever having to manually look at each table.
  • Each table is a pandas DataFrame, which seamlessly integrates into ETL and data analysis workflows.
  • Export to multiple formats, including JSON, Excel and HTML.

See comparison with other PDF table extraction libraries and tools.

197 questions
0
votes
1 answer

Upload images to cloud and then paste the respective link to a respective dataframe

I've PDFs with tables and the image diagram related to the content of tables. Both, table and image on a single page. I've extracted the Tables using the Camelot library. And also images using Fitz library. Using Python Now I want to upload those…
0
votes
1 answer

Borderless pdf extraction to json is not working properly for Python camelot library

Can anyone give me quick answer/help that as we are facing some issue after pdf extraction to json using python camelot is not giving exact content. some content is missing after extraction.
Goutam Ghosh
  • 87
  • 1
  • 7
0
votes
1 answer

how to find area coordinates of a invoice table in pdf file using python?

how to find area coordinates of a invoice table in pdf file using python ? I am currently using camelot or tabula for table extraction from pdf files. However i would like to know if there is way to extract area coordinates of each tables so that i…
0
votes
0 answers

Converting pdf to excel (getting specific tables using Camelot)

i'm using camelot to read a pdf and print out tables, but it appears that it doesn't read the tables as expected. i used a pdf to excel convertor from a website and got the results i expected, so i assume tables exist. i also highlighted the pdf and…
J. Doe
  • 269
  • 1
  • 8
0
votes
1 answer

try except IndexError - I am not getting the desired result

I am trying to read PDF files and to convert them to clean data frames in Python. I loop through all relevant pages and want to append the data frames step-by-step to get one big table with all information. Pages 32-33 need a slightly different…
Florian Seliger
  • 421
  • 4
  • 16
0
votes
2 answers

How to include Ghostscript with cx_freeze

Is there a way to include Ghostscript with cx_freeze in virtualenv I have tried this pip install python3_ghostscript-0.5.0-py3-none-any.whl but still getting below error Downloaded the .whl file from this link -…
Rocky
  • 950
  • 1
  • 7
  • 12
0
votes
0 answers

Importing camelot in Python is showing Relink error and segmentation fault

I have been using camelot to extract tables from a pdf file and the code is working fine in my local setup. But when I run the same code in my DigitalOcean droplet, this error comes up after importing camelot python3: Relink…
prabhupant
  • 31
  • 8
0
votes
0 answers

How do I solve the camelot-py read_pdf error "EOF marker not found"?

I'm using a text-based pdf, as required, and trying to read the tables off it using the flavor='stream' option. When I run the python script, this error shows up: File "/path/foo.py", line x, in File "/path/foo.py", line x, in read_pdf File…
pandaero
  • 27
  • 5
0
votes
0 answers

Print tables using python camelot library to used to extract tables

Is there a way to get the sdout of the extracted tables of the PDFs to print in the terminal? Example: import camelot tables = camelot.read_pdf('List.pdf') tables.export('newpdf.json', f='json') for row in tables[0] #trying to print table... …
answerSeeker
  • 2,692
  • 4
  • 38
  • 76
0
votes
1 answer

What is better, read all pages at once or page by page in python-camelot?

I will run camelot on a simple digital ocean instance (1 vCPUs, 1GB ram) everyday to extract information from a PDF with +-150 pages and store in a database. What would be a best practice for this: a) read all pages at once…
0
votes
0 answers

converted PDF tabular data into csv, now how to store it into database?

I have converted pdf tables data into CSV in my Django projects using Camelot and it automatically stores in my root directory. now how I'm gonna put my CSV data into my MySQL database? I have created my Model as the CSV file row's name. can anyone…
zenvar
  • 19
  • 8
0
votes
0 answers

How to insert PDF table data into database

I have extracted pdf table data by using Camelot but now how can I do put my table data into my database like do I need to convert it into CSV? like is there any other way to put it into my database? and is there any other way to choose my specific…
zenvar
  • 19
  • 8
0
votes
0 answers

Python Camelot - export one PDF file to one converted file

Python 3.7 with Camelot 0.7.3. By default, Camelot exports separate converted files for each page of the pdf file. I need it so that one pdf file exports to one converted file (HTML conversion is what we use), regardless of how many pages the pdf…
Stpete111
  • 3,109
  • 4
  • 34
  • 74
0
votes
1 answer

How to extract data from multiple PDFs in the same directory using python-camelot?

I'm trying to extract data from multiple multiple tables in multiple pdf and save it in csv format. I did my research and found python-camelot is good tool to extract. I tried and it works perfectly fine on a single pdf. However, I have over 50 PDFs…
Ahmad B
  • 1
  • 1
  • 3
0
votes
1 answer

Why is it always "module 'xxx' has no attribute 'xxx'"?

i am using pycharm professional, what is bugging me today is, lots of modules i call seems won't work, forexample, plotly, tabula-py, and camelot. From the attached pic below, you can see, i am even working in virtual environment, and just did pip…
yts61
  • 1,142
  • 2
  • 20
  • 33