0

My PDF contains 16 tables on 3 pages, which I want to output to an Excel file as a single worksheet using Camelot. I can extract each page individually with no problems but I cannot figure out how to handle all 3 pages in one pass. My code shown below:

    # Read Obslog Page 1 to extract all the required tables
obstables = camelot.read_pdf(filepath, 
                             pages='1', \
                             flavor='stream', \
                             edge_tol=500, \
                             strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
                             table_areas=[' 15, 750, 575, 680', \
                                          ' 15, 680, 575, 570', \
                                          ' 15, 570, 575, 460', \
                                          ' 15, 460, 575, 380', \
                                          ' 15, 380, 575, 300', \
                                          ' 15, 300, 575, 240', \
                                          ' 15, 240, 575, 180', \
                                          ' 15, 180, 575, 110'], \
                             columns=['','','','','','','',''])
 # Read Obslog Page 2 to extract all the required tables
obstables1 = camelot.read_pdf(filepath, 
                              pages='2', \
                              flavor='stream', \
                              edge_tol=500, \
                              strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
                              table_areas=[' 20, 820, 575, 750', \
                                           ' 20, 730, 140, 655', \
                                           ' 20, 635, 270, 560', \
                                           ' 20, 540, 270, 470'], \
                              columns=['','','',''])
# Read Obslog Page 3 to extract all the required tables
obstables2 = camelot.read_pdf(filepath, 
                              pages='3', \
                              flavor='stream', \
                              edge_tol=500, \
                              strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
                              table_areas=[' 15, 820, 575, 750', \
                                           ' 15, 730, 575, 660', \
                                           ' 15, 640, 575, 570', \
                                           ' 15, 560, 150, 500', \
                                           ' 15, 480, 575, 390',] \
                              columns=['','','','',''])

When I try to execute the script the first line of the page 2 'table_areas' gives me the following syntax error:

table_areas=[' 15, 820, 575, 750',
^^^^^^^^^^^^^^^^^^^^^^^^

I cannot see any syntax problem with this line.

I get the same error if I try to use the 'tables.append' option(as suggested by Anakin87 on 12/7/2021 in answer a similar post). In this case replacing the camelot procedures for pages 2 and 3 with the following code:

     obstables._tables.append(camelot.read_pdf(filepath, 
                                            pages='2', \
                                            flavor='stream', \
                                            edge_tol=500, \
                                            strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
                                            table_areas=[' 20, 820, 575, 750', \
                                                         ' 20, 730, 140, 655', \
                                                         ' 20, 635, 270, 560', \
                                                         ' 20, 540, 270, 470'], \
                                            columns=['','','','']))
                                            
obstables._tables.append(camelot.read_pdf(filepath, 
                                            pages='3', \
                                            flavor='stream', \
                                            edge_tol=500, \
                                            strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
                                            table_areas=[' 15, 820, 575, 750', \
                                                         ' 15, 730, 575, 660', \
                                                         ' 15, 640, 575, 570', \
                                                         ' 15, 560, 150, 500', \
                                                         ' 15, 480, 575, 390',] \
                                            columns=['','','','','']))

Appending all the tables seems a good option as I the final output will be concatenated to a single dataframe before output to an Excel worksheet, however at the moment I am stuck with the cause of the syntax error.

Jecook
  • 21
  • 3

1 Answers1

0

After going through all the code the error was a simple rookie mistake! I was trying find the syntax error on the first line of the table_areas definition, in fact I had left a comma in the last line of the definition before the ']'. I was slightly mislead by the error message which pointed to the first line of the table_areas definition rather than the last, because I copy/pasted the code this was also why the 'tables.append' option failed.

' 15, 480, 575, 390',] \

which should have read

' 15, 480, 575, 390'], \
Jecook
  • 21
  • 3