0

So I have a table like this one, with an unknown number of description lines. Some can have 1, 2, 5, even zero, or more lines:

enter image description here

(I removed all sensitive informations.)

and I use :

with pdfplumber.open("invoice.pdf") as pdf:
   pages = pdf.pages
   for page in pages:
      page.extract_table()

which is does extract all data from the table but the second column it treats as one row. I want somehow to split the lines of second column (or better all columns) by a small blank row, which so I put it on red rectangles to highlight it.

I know that I need to use table_settings={}, but I can't figure out ... yet, which property (ies), to use ?

What I tried:

print(page.extract_table(table_settings={
                    "horizontal_strategy": "text",
                    "snap_y_tolerance": 3,
                    "keep_blank_chars": True,
                }))

Which, again, it splits when he wants ..

So it's possible to extract a mix-borderless table ?

Cristian F.
  • 328
  • 2
  • 12

0 Answers0