My PDF contains 16 tables on 3 pages, which I want to output to an Excel file as a single worksheet using Camelot. I can extract each page individually with no problems but I cannot figure out how to handle all 3 pages in one pass. My code shown below:
# Read Obslog Page 1 to extract all the required tables
obstables = camelot.read_pdf(filepath,
pages='1', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 15, 750, 575, 680', \
' 15, 680, 575, 570', \
' 15, 570, 575, 460', \
' 15, 460, 575, 380', \
' 15, 380, 575, 300', \
' 15, 300, 575, 240', \
' 15, 240, 575, 180', \
' 15, 180, 575, 110'], \
columns=['','','','','','','',''])
# Read Obslog Page 2 to extract all the required tables
obstables1 = camelot.read_pdf(filepath,
pages='2', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 20, 820, 575, 750', \
' 20, 730, 140, 655', \
' 20, 635, 270, 560', \
' 20, 540, 270, 470'], \
columns=['','','',''])
# Read Obslog Page 3 to extract all the required tables
obstables2 = camelot.read_pdf(filepath,
pages='3', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 15, 820, 575, 750', \
' 15, 730, 575, 660', \
' 15, 640, 575, 570', \
' 15, 560, 150, 500', \
' 15, 480, 575, 390',] \
columns=['','','','',''])
When I try to execute the script the first line of the page 2 'table_areas' gives me the following syntax error:
table_areas=[' 15, 820, 575, 750',
^^^^^^^^^^^^^^^^^^^^^^^^
I cannot see any syntax problem with this line.
I get the same error if I try to use the 'tables.append' option(as suggested by Anakin87 on 12/7/2021 in answer a similar post). In this case replacing the camelot procedures for pages 2 and 3 with the following code:
obstables._tables.append(camelot.read_pdf(filepath,
pages='2', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 20, 820, 575, 750', \
' 20, 730, 140, 655', \
' 20, 635, 270, 560', \
' 20, 540, 270, 470'], \
columns=['','','','']))
obstables._tables.append(camelot.read_pdf(filepath,
pages='3', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 15, 820, 575, 750', \
' 15, 730, 575, 660', \
' 15, 640, 575, 570', \
' 15, 560, 150, 500', \
' 15, 480, 575, 390',] \
columns=['','','','','']))
Appending all the tables seems a good option as I the final output will be concatenated to a single dataframe before output to an Excel worksheet, however at the moment I am stuck with the cause of the syntax error.