So I have a table like this one, with an unknown number of description lines. Some can have 1, 2, 5, even zero, or more lines:
(I removed all sensitive informations.)
and I use :
with pdfplumber.open("invoice.pdf") as pdf:
pages = pdf.pages
for page in pages:
page.extract_table()
which is does extract all data from the table but the second column it treats as one row. I want somehow to split the lines of second column (or better all columns) by a small blank row, which so I put it on red rectangles to highlight it.
I know that I need to use table_settings={}
, but I can't figure out ... yet, which property (ies), to use ?
What I tried:
print(page.extract_table(table_settings={
"horizontal_strategy": "text",
"snap_y_tolerance": 3,
"keep_blank_chars": True,
}))
Which, again, it splits when he wants ..
So it's possible to extract a mix-borderless table ?