Questions tagged [tabula]

Tabula is a Java library and command line tool for extracting tables from PDF documents.

Tabula allows you to extract data from PDF files into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use graphical user interface. It works on Mac, Windows and Linux.

Resources

309 questions

votes

1 answer

How do I know if a jruby module is even installed?

I'm super unfamiliar with Ruby, Rails, Jruby. But I really want to try out tabula-extractor. I believe I installed it properly, alltho I don't know how to check. This is my script, based on their initial suggestion: #!/usr/bin/jruby require…

asked Apr 17 '14 at 17:07

Amanda

12,099
17
63
91

-1

votes

2 answers

Dropping rows based on a string in a table

Code to drop rows based on a partial string is not working. Very simple code, and it runs fine but doesn't drop the rows I want. The original table in the pdf looks like…

python pandas dataframe pdf tabula

asked Nov 20 '22 at 18:05

Karl

-1

votes

1 answer

tabula-py not run with some pdf file

I'm trying to extract table from some pdf by tabula (python) i faced with the error as below with some file pdf. tables = read_pdf(file_path, pages = 'all') Error from tabula-java: Error: File does not exist Traceback (most recent call last): …

python tabula

asked Sep 11 '22 at 13:33

regulus

-1

votes

2 answers

Python Reformat Dataframe

I am trying to iterate through the dataframe and if the row's value Age column is empty, it will move the value in Name column to the Location column of the previous row. Is there a quick way to do this? As-Is To-Be

python pandas tabula

asked Dec 31 '21 at 04:48

Jack Wilson

-1

votes

1 answer

Couldn't resolve a key not in index error

This is the link to the pdf file from which I want to extract data def onlyenglish(text): import re alphabet_regular_expression = re.compile("[^a-zA-Z|()]") text = re.sub(alphabet_regular_expression,"",text) return text …

python-3.x regex tabula

asked May 12 '21 at 05:05

Nilay Sinha

-1

votes

1 answer

HTTP Error 403: Forbidden with Tabula/Requests

I am getting the error "urllib.error.HTTPError: HTTP Error 403: Forbidden" with Tabula, is there a way to fix this? It has worked correctly for most of this year: import tabula from bs4 import BeautifulSoup import requests url =…

python web-scraping beautifulsoup python-requests tabula

asked Dec 10 '20 at 14:28

Sanch

-1

votes

1 answer

Converting PDF document to DataFrame

I have a PDF document with 388 pages and 1 table per page , i am trying to get them converted to excel or multiple dataframes, but having some difficulties, i have tried pypdf2 and tabula libraries but it stops after extracting only one page. The…

python pandas pdf pypdf tabula

asked Dec 06 '19 at 05:30

Equan Ur Rehman

-1

votes

2 answers

tabula-py - "No module named requests"

I'm make "scripts.py" with code (https://github.com/chezou/tabula-py#example) and when I start "python scripts.py" I got this error: Traceback (most recent call last): File "script.py", line 1, in import tabula File…

python tabula

asked May 08 '17 at 17:26

Nick Jonson

-2

votes

1 answer

how to read a pdf inside a website without downloading the file

I want to get the data from a pdf that its inside a website, i have tried with tabula but it gave me the following error: CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar',…

python database pdf web-scraping tabula

asked Jan 13 '23 at 16:46

j-hugo-ta

Prev 1 2 3

…