I am using python 3.11 to extract tables from several (chemical analysis) PDF files that are all structured the same: at the top, there is some general information like date and sample number, then the rest is the actual measurements. I have a code that worked well for more than a hundred files except for 2.
When I run it, it returns ValueError: min() arg is an empty sequence
. More specifically from xmin = min([t.x0 for direction in t_bbox for t in t_bbox[direction]])
The specific line is:
data = camelot.read_pdf(filename,pages='all', flavor='stream', strip_text='\n', table_areas=['350,850,580,30']), columns=['420,490']
.
If I remove table_areas
and columns
arguments, the function works but it doesn't detect the whole page, here is the plot I get from running camelot.plot(data[0], kind='contour').show()
. As you can see, it doesn't detect the whole page (nor the relevant data I want which is the two right-most columns). For reference, here is how the plot should have looked (from a similar file) with table_areas
and columns
arguments.
As I said, every PDF file is structured similarly and I couldn't find any difference between the working files and the 2 files that cause the error.
Any help would be greatly appreciated!
Tried extracting the data from the PDF file, and I expected to receive the data as a DataFrame containing the two right-most columns. This resulted in the error I mentioned earlier. Attempting to let Camelot auto-detect the table was unsuccessful.