Using reportlab I made 2 1 page pdfs with 1 table:
The data in the table is this:
data1 = [['00', '', '02', '', '04'],
['', '11', '', '13', ''],
['20', '', '22', '23', '24'],
['30', '31', '32', '', '34']]
The point is, to get the rows including the empty cells. If the table has borders, no problem.
But if the table has no borders, I don't get any results for table from the code below.
Any ideas why?
Like I said, the pdfs are identical except for pdf1 does not have borders, pdf2 has borders.
with pdfplumber.open(path2pdf + savename1) as pdf1:
# Get the first page of the object
page = pdf1.pages[0]
# Get the text data of the page
text = page.extract_text()
# Get all the tabular data of this page
tables = page.extract_tables()
# Traversing table
for t_index in range(len(tables)):
table = tables[t_index]
# Traversing each row of data
for data in table:
print(data)
Change pdf1 for pdf2 and I get the required result.
EDIT: I tried with this, but get an error. Not sure how I should format it:
pdf_table = page.extract_tables(vertical_strategy='text', horizontal_strategy='text') Traceback (most recent call last): File "/usr/lib/python3.8/idlelib/run.py", line 559, in runcode exec(code, self.locals) File "<pyshell#70>", line 1, in TypeError: extract_tables() got an unexpected keyword argument 'vertical_strategy'