0

I have a test pdf file with just a 3x3 table that are marked properly with table headings and the sort. What I want to do is extract the format of the table. Like so:

left center right
One Two Three

If that table was in the pdf, I want to be able to know programmatically that the table has three headers "" and one row of data. ""

I am using fitz and when i use this code:

for page in doc:
   tp = page.get_textpage()                    # display list from above
  
   html = tp.extractHTML()                  # HTML format
   print(html)

It seems to just remove all the actual html and replace it with just paragraph tags and div tags. What am I doing wrong?

Mat
  • 67
  • 1
  • 3
  • 17

0 Answers0