0

Following this script, I could know the bounding box of the tables in my e-pdf:

tabula.read_pdf(file, stream=True,guess=True,lattice=False,multiple_tables=True, output_format="json", pages=pg_num)

However, I want to plot the bounding boxes detected on the image. I realised that pixels or locations changed from x,y,w,h from the tabula bounding boxes are different from the images converted from the pdf using this script:

from pdf2image import convert_from_path

pages = convert_from_path(file)
open_cv_image = np.array(pages[pg_num - 1]) 

Any thoughts on how to synchronise location in the tabula pdf vs location from the image exported?

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
skw1990
  • 63
  • 6

0 Answers0