Reading a table with blank cells with tabula-py

Asked Sep 18 '19 at 07:11

Active Sep 18 '19 at 07:12

Viewed 1,276 times

I am trying to load a large table (an example is attached) from form 10-K into Python using tabula-py. The table does not have clear border, and have a lot of blank cells, which cause several issues.

My code is

df = tabula.read_pdf("firm_xxx_10K.pdf", pages='100-101',guess=True,stream=True,columns=(144,210,300,340,380,420,450))

With stream=True, I get all the data, but the information in multiple rows are recognized as separate entries. With lattice=True, then the cells with multiple rows are correctly recognized as one cell, but now the results miss a lot of observations.

Is there a better way to set the options? I tried many options, but now I am stuck. Any help is much appreciated. Best,

Example of the Table I am Trying to Read

edited Sep 18 '19 at 07:12

bharatk

4,202
5
16
30

asked Sep 18 '19 at 07:11

ynchoir

Reading a table with blank cells with tabula-py

0 Answers0