0

I tried to extract table data from a Multi page Multi Table PDF using following code

import camelot
tables = camelot.read_pdf('InputPDF.pdf',flavor='stream',multiple_tables=True,pages='all')
tables.export('foo1.csv', f='csv', compress=True)  # json, excel, html

enter image description here

But the 4,5 tables in Page 2 not extracted. same type of tables extracted in other pages properly

Attached the PDF file image which I tried as an example

There is no ERROR shown

Nimantha
  • 6,405
  • 6
  • 28
  • 69
  • Usually an image is not helpful, usually the actual pdf is required for analysis. – mkl Jul 17 '21 at 07:52
  • I am unable to share the PDF in stackoverflow, can check PDF at https://github.com/atlanhq/camelot/issues/464 – Kavita Polasa Jul 17 '21 at 13:30
  • I don't know camelot details but I saw that in your document the fourth and fifth tables are very short, one or two rows only. As tables in PDFs usually are not marked as such, heuristics have to recognize them. Probably the camelot heuristics by default are not convinced by so little to go on; probably you can tweak camelot to be more easily convinced by some changing some settings. – mkl Jul 20 '21 at 09:14

0 Answers0