Questions tagged [tabula-py]

tabula-py is a wrapper of tabula-java that allows you to extract tables into DataFrame or JSON using Python. You can also extract tables from PDF into CSV, TSV or JSON file.

Installing tabula-py using pip :

pip install tabula-py
132 questions
1
vote
0 answers

Reading a table with blank cells with tabula-py

I am trying to load a large table (an example is attached) from form 10-K into Python using tabula-py. The table does not have clear border, and have a lot of blank cells, which cause several issues. My code is df =…
ynchoir
  • 11
  • 2
1
vote
0 answers

Tabula doesn't work after converting python script to .exe

I am using tabula and python to write a script for web scraping. I tested it and it worked, and I need to convert the .py file into .exe so that it can be run on my company's computers (my office banned the installation of python). However, when I…
Julian Chu
  • 1,790
  • 2
  • 7
  • 12
1
vote
0 answers

Python- Exporting a Dataframe into a csv

I'm trying to write a dataframe file to a csv using pandas. I'm getting the following error AttributeError: 'list' object has no attribute 'to_csv'. I believe I'm writing the syntax correctly, but could anyone point out where my syntax is incorrect…
dsilva
  • 93
  • 9
0
votes
0 answers

About the error "lines must be orthogonal, vertical and horizontal" in tabula-py

When I parse a PDF file with tabula-py in python, I get the following error Exception in thread "main" java.lang.IllegalArgumentException: lines must be orthogonal, vertical and horizontal at…
atk
  • 3
  • 1
0
votes
0 answers

Reading complex tables from pdf

I have a pdf file (for now it's just one for testing, but i would like to find a generic solution as I'll have more file in the near future), in it couple of tables with different formats, and the language of the tables is RTL (and not LTR). Another…
Alon
  • 45
  • 2
0
votes
1 answer

FileNotFoundError: [WinError 2] -python

I'm new to python and I'm getting this error when trying to execute the following code which aims to take the contents of this pdf and put it in an excel document. My os is Windows 10 and I'm using VS code via Anaconda3. I'm not sure what I'm doing…
shattv
  • 13
  • 2
0
votes
2 answers

Tabula-py: Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full

I installed both the tabula-py library and also Java to try and scrape tables from PDFs. I ran some simple code below with a sample pdf I found online: from tabula import read_pdf path =…
RIPPLR
  • 198
  • 1
  • 13
0
votes
0 answers

Python Tabula: Reading in PDF to Python as Pandas Dataframe

Scraping PDF data from a website, they changed their PDF formatting so I can no longer use my solution that worked for every other PDF. Unsure of an alternative method. Hello everyone, I am trying to pull a PDF from the following website (in the…
jare2620
  • 13
  • 3
0
votes
0 answers

Stop tabula-py from printing "The output file is empty"

so I'm currently writing up some code to scrape a bunch of pdfs for information, however I don't want all the pages to be returned since some aren't useful. I've already solved that but I keep getting a message saying "The output file is empty". I…
jdah97
  • 1
0
votes
1 answer

How to use tabula on AWS Lambda to read pdf?

`I know that we have to download Java for it to run, I did it on my IDE and it worked. But idk how to download it on the AWS Lambda. If anyone could help me with that I would appreciate it. I Think the code itself produces what I am expecting,…
0
votes
0 answers

Keep getting this error: raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), path) FileNotFoundError: [Errno 2] No such file or di

I am trying to elevate my basic python skills tinkering with some codes generated with the help of ChatGPT. This is the code I have for converting PDF to Excel. I keep getting the error that file is not found. it is obviously there. i copied the…
0
votes
0 answers

Table extraction PDF

I want to create a code which extracts tables from pdf and until now I have used tabula-py and camelot. For now it seems that camelot works better, but still for some complex tables (like tables with no boreders inside other tables) it's not working…
Andreea Elena
  • 135
  • 1
  • 8
0
votes
2 answers

Wrong output in writing list of pandas dataframe to separate sheets in same excel

I have a code where I am using tabula-py to read tables from pdf and then write the resulting list of dataframes to a single excel with each dataframe in a separate sheet. Here is my current code: def read_pdf(pdf_file): output_filepath =…
user2966197
  • 2,793
  • 10
  • 45
  • 77
0
votes
0 answers

Tabula-py - Pdf Extraction

while extracting table from pdf using tabula..last 3 rows are not extracting..can anyone let me know where I'm going wrong? I used read_pdf and give the path,pages=all,multiple_table=True and stream=True as parameters
0
votes
0 answers

In my Dataframe, while accessing the data of the second page of the pdf file one columns data is being deleted and each column data is shifted to left

[text](https://stackoverferror in the dataframelow.com) I tried converting a pdf to a csv file using Python. I'm expecting the answer for my error, column headings was mismatching with the columns data, because sl.no. data is not displayed in the…
1 2 3
8 9