Questions tagged [tabula]

Tabula is a Java library and command line tool for extracting tables from PDF documents.

Tabula allows you to extract data from PDF files into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use graphical user interface. It works on Mac, Windows and Linux.

Resources

309 questions
0
votes
1 answer

How do I know if a jruby module is even installed?

I'm super unfamiliar with Ruby, Rails, Jruby. But I really want to try out tabula-extractor. I believe I installed it properly, alltho I don't know how to check. This is my script, based on their initial suggestion: #!/usr/bin/jruby require…
Amanda
  • 12,099
  • 17
  • 63
  • 91
-1
votes
2 answers

Dropping rows based on a string in a table

Code to drop rows based on a partial string is not working. Very simple code, and it runs fine but doesn't drop the rows I want. The original table in the pdf looks like…
Karl
  • 11
  • 1
-1
votes
1 answer

tabula-py not run with some pdf file

I'm trying to extract table from some pdf by tabula (python) i faced with the error as below with some file pdf. tables = read_pdf(file_path, pages = 'all') Error from tabula-java: Error: File does not exist Traceback (most recent call last): …
regulus
  • 11
  • 5
-1
votes
2 answers

Python Reformat Dataframe

I am trying to iterate through the dataframe and if the row's value Age column is empty, it will move the value in Name column to the Location column of the previous row. Is there a quick way to do this? As-Is To-Be
-1
votes
1 answer

Couldn't resolve a key not in index error

This is the link to the pdf file from which I want to extract data def onlyenglish(text): import re alphabet_regular_expression = re.compile("[^a-zA-Z|()]") text = re.sub(alphabet_regular_expression,"",text) return text …
-1
votes
1 answer

HTTP Error 403: Forbidden with Tabula/Requests

I am getting the error "urllib.error.HTTPError: HTTP Error 403: Forbidden" with Tabula, is there a way to fix this? It has worked correctly for most of this year: import tabula from bs4 import BeautifulSoup import requests url =…
Sanch
  • 367
  • 2
  • 11
-1
votes
1 answer

Converting PDF document to DataFrame

I have a PDF document with 388 pages and 1 table per page , i am trying to get them converted to excel or multiple dataframes, but having some difficulties, i have tried pypdf2 and tabula libraries but it stops after extracting only one page. The…
Equan Ur Rehman
  • 229
  • 1
  • 2
  • 11
-1
votes
2 answers

tabula-py - "No module named requests"

I'm make "scripts.py" with code (https://github.com/chezou/tabula-py#example) and when I start "python scripts.py" I got this error: Traceback (most recent call last): File "script.py", line 1, in import tabula File…
-2
votes
1 answer

how to read a pdf inside a website without downloading the file

I want to get the data from a pdf that its inside a website, i have tried with tabula but it gave me the following error: CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar',…
1 2 3
20
21