Highest Voted 'tabula' Questions

1

vote

0 answers

How to resolve a trouble in using Tabula?

I'm a data analyst and first of all would like to thank you and your friends for a wonderful tool Tabula. I have been using it over the recent months periodically and during past week quite actively. And, suddenly the tool ran out of order. I even…

asked Sep 04 '22 at 12:24

Karine

11
2

1

vote

0 answers

Camelot Table Extraction Error (PdfReadWarning: incorrect startxref pointer(0) [_reader.py:938])

I am trying to extract some tables from a .pdf doc but I got an error: "PdfReadWarning: incorrect startxref pointer(0) [_reader.py:938]" The code is pretty simple because I am just testing: import camelot file = r"myPCPath\myFile.pdf" tables =…

python pdf tabula python-camelot

asked May 01 '22 at 23:33

Canal NerdZone

11
2

1

vote

1 answer

Tabula font error in reading table from PDF

I saw a lot of people had similar issues, but not this one. And many of the similar issues do not have an applicable solution, unfortunately. I am getting this warning from tabula. And when I look at the result or test the length of what it…

python fonts tabula

asked Apr 12 '22 at 22:21

ralbhar

11
2

1

vote

1 answer

Tabula - py ignores NaN values and shifts table cell values into the wrong column

So I was experimenting a little bit with tabula for Python and had a strange exception. The first Column of the table always stretches over 4 rows. So for the first 4 cells, witch are stretched over multiple rows, tabula just asumes NaN for the the…

python pandas pdf tabula tabula-py

asked Mar 23 '22 at 13:27

mathis_dukatz

57
6

1

vote

1 answer

Using multirow and multicoloum in Table in Overleaf

I am trying to make a table where the first column is multiple columns (2 columns) and also multiple rows (2 rows). The error is on the first column (Aspects). How to make it…

latex tabula overleaf multirow multicol

asked Feb 07 '22 at 09:15

MK Huda

605
1
6
16

1

vote

1 answer

Error in tabula tabula-py when specifying area parameter

I am getting an error when I specify the area in the following code: data = tb.read_pdf(pdf_file, guess=False, stream=True, pandas_options ={'header': None}, encoding="utf-8", multiple_tables =False, area = [136,10,10,10], pages ='1', columns =…

python tabula

asked Jan 31 '22 at 11:34

Shamoun Ilyas

13
2

1

vote

0 answers

Reading Tables from PDFs in S3 bucket using Camelot or Tabula packages: s3 URL

Can Python packages that pull tables from PDFs, such as Tabula and Camelot, read in the PDF from an S3 bucket - like with Pandas. For example, I can read a CSV file from the S3 bucket like this: df =…

tabula python-camelot

asked Jan 27 '22 at 21:42

user14530921

11
1

1

vote

1 answer

tabula extract table from pdf remove line break

I have a table with wrapped text in a pdf file I used tabula to extract table from the pdf file file1 = "path_to_pdf_file" table = tabula.read_pdf(file1,pages=1,lattice=True) table[0] However, the end result looking like this: is there a way to…

python pdf tabula

asked Jan 13 '22 at 15:05

user11666514

165
1
8

1

vote

2 answers

Using Tabula to pull tables out pdf

We have standard reports uploaded as PDFs on a daily basis. In the PDFs are some tables that we want to pull into datasets. I have tabula imported in code repositories but I can't seem to get code repositories to bring in the PDF. I recieve this…

tabula palantir-foundry

asked Jan 12 '22 at 22:25

Connor

41
3

1

vote

0 answers

Tabula Java Heap Error — only 1 page to convert

I want to extract tables from 1 page pdf (50 KB) using Tabula, but it returns this error: 2022-01-08 17:33:25.054:INFO:oejsh.ContextHandler:main: Started…

java pdf heap-memory tabula

asked Jan 08 '22 at 15:00

Maria Kasakowa

43
2

1

vote

1 answer

tabula-py can't read file when the python script called by java

I'm working on a project base on java. And the java program will run command to call a python script. The python script is used tabula-py to read a pdf file and return the data. I tried the python script was work when I direct call it in terminal…

python java tabula tabula-py

asked Nov 29 '21 at 04:09

Fong Tom

87
5

1

vote

0 answers

Python pandas df - Columns must be same length as key

I have a dataframe I created by scraping this PDF with tabula. I'm trying to create a point column using geocoder - but I keep getting a Columns must be same length as key error. My code, as well as a link to the PDF is below: PDF:…

python pandas dataframe tabula

asked Nov 23 '21 at 02:56

Adam

315
1
11

1

vote

1 answer

Error with tabula in python regarding dependency (colab and locally)

I am working on extracting data from a number of pdf documents in python, testing in colab. A solution would be great on colab, but also locally if that is not possible. There is a lot of interesting entries per page, so I chose tabula. Code works…

python pdf tabula

asked Oct 26 '21 at 06:37

Andreas Theil

31
4

1

vote

0 answers

Unable to read pdf using tabula-py

I am trying to parse a pdf using tabula-py but I keep getting this error stack - CalledProcessError(1, ['java', '-Dfile.encoding=UTF8', '-jar',…

python tabula tabula-py

asked Jul 24 '21 at 13:42

shekwo

1,411
1
20
50

1

vote

1 answer

Tabula-py doesn't recognise columns correct

I am trying to recognise pdf document using tabula. I use this code: df = tabula.read_pdf(io.BytesIO(content), pages=12,pandas_options={'header': None}, multiple_tables = True,columns=(78.39, 226.97, 280.97,370.04,461.02,550.06)) However, after…

python python-3.x pdf tabula

asked Jun 10 '21 at 09:17

Vasilieva Polina

11
3

Questions tagged [tabula]

Resources

How to resolve a trouble in using Tabula?

Camelot Table Extraction Error (PdfReadWarning: incorrect startxref pointer(0) [_reader.py:938])

Tabula font error in reading table from PDF

Tabula - py ignores NaN values and shifts table cell values into the wrong column

Using multirow and multicoloum in Table in Overleaf

Error in tabula tabula-py when specifying area parameter

Reading Tables from PDFs in S3 bucket using Camelot or Tabula packages: s3 URL

tabula extract table from pdf remove line break

Using Tabula to pull tables out pdf

Tabula Java Heap Error — only 1 page to convert

tabula-py can't read file when the python script called by java

Python pandas df - Columns must be same length as key

Error with tabula in python regarding dependency (colab and locally)

Unable to read pdf using tabula-py

Tabula-py doesn't recognise columns correct