0

i'm using camelot to read a pdf and print out tables, but it appears that it doesn't read the tables as expected. i used a pdf to excel convertor from a website and got the results i expected, so i assume tables exist. i also highlighted the pdf and notice the text is laid out in table format. i'm going to look at other possibilities, but it seems i can pick out specific tables with camelot, which is perfect for what i am trying to do. my question is why might this be the case and if there is anything else that could do this. thank you

i tried:

file = "file.pdf"
tables = camelot.read_pdf(file, pages = "1-end")
print(tables[2].df)

and got this as a result:

IndexError: list index out of range

so, i tried this:

file = "file.pdf"
tables = camelot.read_pdf(file, pages = "1-end")
print(tables.n)

and got 0.

the expected results should be something like this:

name                                        id
job                                     number
address                                 none    
address                                 xyz 
address                                 date    
            company name                            
            quarter report                          
            date                            
Group   Manager     quarter1    quarter2    quarter3    quarter4                total
element2    A           $          $           $           $                      $
notElement  B           $          $           $           $                      $
card3       C           $          $           $           $                      $
box4        D           $          $           $           $                      $
element3    E           $          $           $           $                      $
box1        F           $          $           $           $                      $
notElement  B           $          $           $           $                      $
notElement  C           $          $           $           $                      $             
card7       D           $          $           $           $                      $
element4    E           $          $           $           $                      $
                                        
               quarter1 quarter2 quarter3 quarter4                      
average           $        $                                
results          none     none                              
missed                     1                                
missed                     1            
J. Doe
  • 269
  • 1
  • 8
  • If you can, please post an example PDF. – Stefano Fiorucci - anakin87 Aug 05 '20 at 08:01
  • i'm not sure how to add a pdf to the post, but the link is a sample pdf that has the same issue. camelot says there are no tables with this sample. some pdfs only have 1 or 2 tables according to camelot and they will not be something i even need. ideally, i'd like to view the whole transcript and scrape any table. if that's not really possible with python i'll use a pdf to excel convertor, but i'd appreciate any help. thank you. https://www.fs.usda.gov/Internet/FSE_DOCUMENTS/stelprdb5407090.pdf – J. Doe Aug 05 '20 at 20:23
  • Your PDF is somehow strange and for Camelot it is difficult to extract information from it. In any case, if your tables hasn't demarcated lines between cells, you should specify `flavor='stream'` (see https://camelot-py.readthedocs.io/en/master/user/how-it-works.html). – Stefano Fiorucci - anakin87 Aug 06 '20 at 06:44

0 Answers0