0

I use camelot in python for table extraction from pdf file. I have code as follows:

tables=camelot.read_pdf(r'file_to_path'
                        ,flavor='lattice',pages='1'
                        ,shift_text=['']
                        )

The problem is camelot doesn't recognize all tables. I run this code to debug issue "visually"

camelot.plot(tables[0],kind='contour').show()

enter image description here

and got output like this. It's clear the fourth table was not recognized. I assume that's because of different shape, I mean without columns in table only rows.

Is there any way to handle this issue?

data_b77
  • 415
  • 6
  • 19
  • I'm trying now to figure out how table area works. Think gonna use _bbox property of all parsed tables and find if there is any space between them and pass this space to table_area while reading pdf – data_b77 Feb 03 '22 at 16:44

1 Answers1

0

For me worked line_scale=40 as additional property while reading pdf

data_b77
  • 415
  • 6
  • 19