Extracting tables from similarly structured PDF files using Camelot sometimes throws 'ValueError: min() arg is an empty sequence'

Question

I am using python 3.11 to extract tables from several (chemical analysis) PDF files that are all structured the same: at the top, there is some general information like date and sample number, then the rest is the actual measurements. I have a code that worked well for more than a hundred files except for 2. When I run it, it returns ValueError: min() arg is an empty sequence. More specifically from xmin = min([t.x0 for direction in t_bbox for t in t_bbox[direction]])

The specific line is: data = camelot.read_pdf(filename,pages='all', flavor='stream', strip_text='\n', table_areas=['350,850,580,30']), columns=['420,490']. If I remove table_areas and columns arguments, the function works but it doesn't detect the whole page, here is the plot I get from running camelot.plot(data[0], kind='contour').show(). As you can see, it doesn't detect the whole page (nor the relevant data I want which is the two right-most columns). For reference, here is how the plot should have looked (from a similar file) with table_areas and columns arguments.

As I said, every PDF file is structured similarly and I couldn't find any difference between the working files and the 2 files that cause the error.

Any help would be greatly appreciated!

Tried extracting the data from the PDF file, and I expected to receive the data as a DataFrame containing the two right-most columns. This resulted in the error I mentioned earlier. Attempting to let Camelot auto-detect the table was unsuccessful.

Related question: https://stackoverflow.com/questions/54004215/python-camelot-extracting-empty-tables Did you try using `table_regions` instead of `table_areas`? Unresolved github issue: https://github.com/camelot-dev/camelot/issues/263 — Stefano Fiorucci - anakin87, May 14 '23 at 15:49
When I use `table_regions` (without `columns` argument) I receive the following error: `ZeroDivisionError: float division by zero` from this line: `average_textline_height = sum_textline_height / float(len(textlines))`. — Tomer, May 15 '23 at 05:25
@StefanoFiorucci-anakin87, do you have any insights? So far the only "solution" I found is handling the exception and printing a message that manual work is required. — Tomer, May 16 '23 at 08:36

Extracting tables from similarly structured PDF files using Camelot sometimes throws 'ValueError: min() arg is an empty sequence'

0 Answers0