i'm using camelot to read a pdf and print out tables, but it appears that it doesn't read the tables as expected. i used a pdf to excel convertor from a website and got the results i expected, so i assume tables exist. i also highlighted the pdf and notice the text is laid out in table format. i'm going to look at other possibilities, but it seems i can pick out specific tables with camelot, which is perfect for what i am trying to do. my question is why might this be the case and if there is anything else that could do this. thank you
i tried:
file = "file.pdf"
tables = camelot.read_pdf(file, pages = "1-end")
print(tables[2].df)
and got this as a result:
IndexError: list index out of range
so, i tried this:
file = "file.pdf"
tables = camelot.read_pdf(file, pages = "1-end")
print(tables.n)
and got 0.
the expected results should be something like this:
name id
job number
address none
address xyz
address date
company name
quarter report
date
Group Manager quarter1 quarter2 quarter3 quarter4 total
element2 A $ $ $ $ $
notElement B $ $ $ $ $
card3 C $ $ $ $ $
box4 D $ $ $ $ $
element3 E $ $ $ $ $
box1 F $ $ $ $ $
notElement B $ $ $ $ $
notElement C $ $ $ $ $
card7 D $ $ $ $ $
element4 E $ $ $ $ $
quarter1 quarter2 quarter3 quarter4
average $ $
results none none
missed 1
missed 1