0

Scraping table data from a .PDF using Camelot-py, and it is not detecting tables with 2/1 rows.

PDF I am trying to read: enter image description here

Code used to read tables:

abc = camelot.read_pdf('IR-O-U-0436.pdf', pages="all")

The output I am getting:
enter image description here

From the images, you can see that sponsored research table is being read in abc[15] and the second part of the consultancy project details table is being read in abc[16] but the first part of the consultancy project details table is being missed by Camelot.

Any insight would be greatly appreciated.

Pawara Siriwardhane
  • 1,873
  • 10
  • 26
  • 38

1 Answers1

0

I had similar tables in some of the pdfs which were not detected by camelot. But after passing the parameter "line_scale" to read_pdf function, I was able to detect these tables as well. You have to get that specific value for "line_scale" parameter which will give you all tables irrespective of the row number. For me

line_scale = 35

worked fine. You can check for yourself.

Megha Sirisilla
  • 151
  • 2
  • 12