Some tables are missing while extracting from PDF using Camelot

Asked Jul 17 '21 at 07:32

Active Jul 17 '21 at 07:38

Viewed 267 times

I tried to extract table data from a Multi page Multi Table PDF using following code

import camelot
tables = camelot.read_pdf('InputPDF.pdf',flavor='stream',multiple_tables=True,pages='all')
tables.export('foo1.csv', f='csv', compress=True)  # json, excel, html

enter image description here

But the 4,5 tables in Page 2 not extracted. same type of tables extracted in other pages properly

Attached the PDF file image which I tried as an example

There is no ERROR shown

edited Jul 17 '21 at 07:38

Nimantha

6,405
6
28
69

asked Jul 17 '21 at 07:32

Kavita Polasa

Usually an image is not helpful, usually the actual pdf is required for analysis. – mkl Jul 17 '21 at 07:52
I am unable to share the PDF in stackoverflow, can check PDF at https://github.com/atlanhq/camelot/issues/464 – Kavita Polasa Jul 17 '21 at 13:30
I don't know camelot details but I saw that in your document the fourth and fifth tables are very short, one or two rows only. As tables in PDFs usually are not marked as such, heuristics have to recognize them. Probably the camelot heuristics by default are not convinced by so little to go on; probably you can tweak camelot to be more easily convinced by some changing some settings. – mkl Jul 20 '21 at 09:14

Some tables are missing while extracting from PDF using Camelot

0 Answers0