Obtained position of tables in pdf and plot the bounding box on the image

Asked Feb 18 '23 at 08:26

Active Feb 18 '23 at 10:38

Viewed 102 times

Following this script, I could know the bounding box of the tables in my e-pdf:

tabula.read_pdf(file, stream=True,guess=True,lattice=False,multiple_tables=True, output_format="json", pages=pg_num)

However, I want to plot the bounding boxes detected on the image. I realised that pixels or locations changed from x,y,w,h from the tabula bounding boxes are different from the images converted from the pdf using this script:

from pdf2image import convert_from_path

pages = convert_from_path(file)
open_cv_image = np.array(pages[pg_num - 1])

Any thoughts on how to synchronise location in the tabula pdf vs location from the image exported?

edited Feb 18 '23 at 10:38

Christoph Rackwitz

11,317
4
27
36

asked Feb 18 '23 at 08:26

skw1990

Obtained position of tables in pdf and plot the bounding box on the image

0 Answers0