Camelot-py not detecting tables with two rows

Question

Scraping table data from a .PDF using Camelot-py, and it is not detecting tables with 2/1 rows.

PDF I am trying to read:

Code used to read tables:

abc = camelot.read_pdf('IR-O-U-0436.pdf', pages="all")

The output I am getting:

From the images, you can see that sponsored research table is being read in abc[15] and the second part of the consultancy project details table is being read in abc[16] but the first part of the consultancy project details table is being missed by Camelot.

Any insight would be greatly appreciated.

To obtain useful help, please provide the original PDF. – Stefano Fiorucci - anakin87 Nov 12 '21 at 11:15 — Stefano Fiorucci - anakin87, Nov 12 '21 at 11:15

score 0 · Answer 1 · answered Dec 23 '21 at 05:38

I had similar tables in some of the pdfs which were not detected by camelot. But after passing the parameter "line_scale" to read_pdf function, I was able to detect these tables as well. You have to get that specific value for "line_scale" parameter which will give you all tables irrespective of the row number. For me

line_scale = 35

worked fine. You can check for yourself.

Camelot-py not detecting tables with two rows

1 Answers1