0

I have observed that camelot is not detecting nested tables in the sample document I have. In the image attached, I'm getting only one table extracted as whole. Is there anyway using which we can detect the inner tables as well?

enter image description here

Megha Sirisilla
  • 151
  • 2
  • 12
  • Please explain better what is the desired output and show what you tried. In any case, you can manually specify a list of table_areas, each one for every tables (https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-table-areas). – Stefano Fiorucci - anakin87 Dec 23 '21 at 09:29
  • I want the inner tables, i.e. Fee Schedule table to be detected along with the other tables like Contract Timelines, etc. Now when I pass the pdf to camelot, it just gives me one whole table below the line 'The contract is extracted between ABCD ....'. I know we can pass table_regions or table_areas but I don't to opt for that. I have other pdfs which differ from this. I want a generalised solution, if there is one that. – Megha Sirisilla Dec 23 '21 at 10:23
  • Ok. You can try to pass `table_regions` (https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-table-regions), specifying a fixed restricted part of the page. Maybe if you exclude outer lines, the detection would be more similar to the desired result... Let me know if this works... – Stefano Fiorucci - anakin87 Dec 23 '21 at 10:31
  • Yes. This solution works. I have tried it with table_regions specified. But there is no way we will get it without specifying table_regiogs, right? As in I have many other pdfs too which have straightforward tables unlike this one. I wanted a generalised solution for that reason. I guess I have to check the lines on the pdf and then apply table_regions accordingly. – Megha Sirisilla Dec 23 '21 at 10:58
  • If you are lucky, the table region is the same in the various documents. Maybe, I'm going to write an answer, starting from my last comment... – Stefano Fiorucci - anakin87 Dec 23 '21 at 11:13
  • Yes. I need to look into all the sample pdfs I have and come to a conclusion. Thanks for your suggestion though. – Megha Sirisilla Dec 23 '21 at 11:48
  • If useful, please accept/upvote my answer. – Stefano Fiorucci - anakin87 Jan 02 '22 at 17:57

1 Answers1

0

To programmatically extract internal tables only, you can try passing table_regions parameter, specifying a fixed limited part of the page.

When table_regions is specified, Camelot will only analyze the specified regions to look for tables.