2

I am using Camelot to extract multiple sections of a PDF by the following command.

cgl_section = camelot.read_pdf(filename, flavor='stream', 
              table_areas=['35,490,155,483', '53,480,110,470', '117,480,155,470', 
                           '38,469,106,456', '39,454,105,445', '38,430,155,420', 
                           '38,418,77, 410'])

This runs fine when the PDF actually contains data in these areas. But I'm not expecting data in every PDF that is parsed, some are returned empty. I get the following error when the returned data is not a table, and just has one column.

UserWarning: No tables found in table area 1

and

ValueError: min() arg is an empty sequence

I need a way to extract these specific areas in all PDFs but ignore the empty ones after. Need to be able to use the extracted data in an orderly way.

Open to any other suggestions as well

TIA

A.A. F
  • 349
  • 5
  • 16
  • Can you add the above code in try block `exception`, I'm not aware of camelot. – Mohamed Thasin ah Jan 02 '19 at 10:12
  • Try block will still try to run the command and will face errors in one or more of the tables and abandon the command altogether. I need to be able to extract whatever data is available without the empty tables giving errors. – A.A. F Jan 02 '19 at 10:27

2 Answers2

0

Maybe the option table_regions (introduced in 0.7) can help you.

https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-table-regions

When table_regions is specified, Camelot will only analyze the specified regions to look for tables.
0

I'm having the same exact issue! This isn't a perfect solution, but I believe you can get around this by separating out the particular table_area call that may result in an empty table into its own pdf_read call. Doing this, you can do what was suggested above and simply surround the pdf_read with a try_catch block. This'll give you the robustness you're looking for.