I would like to extract tables from a multiple page pdf. Because of the table properties, I need to use the flavor='stream'
and table_areas
properties to read_pdf
for my table to be properly detected. My problem is that the position of the table is different on each page (the first page has an address head and not the other)
I have tried to provide several areas to the read_pdf
function such as follows:
camelot.read_pdf(file, pages='all', flavor='stream', table_areas=['60, 740, 580, 50','60, 470, 580, 50'])
but this result as having 2 tables per page. How can I specify the table_areas for each page separately?
I have also tried to run several times read_pdf
with different pages
/table_areas
, how ever then I cannot append the several result together to have a single objet:
tables = camelot.read_pdf(file, pages='1', flavor='stream', table_areas=['60, 470, 580, 50'])
tables.append(camelot.read_pdf(file, pages='2-end', flavor='stream', table_areas=['60, 740, 580, 50']))
gives an error as append is not a method of resulting tables
Is there a way to concatenate the results of several call of the read_pdf
function?