Tabula-py extract tables by area coordinates pixels with 300 dpi

Asked Oct 11 '19 at 09:27

Active Oct 11 '19 at 12:43

Viewed 1,220 times

I am using tabula-py to extract tables from pdf by providing an exact area, that hold their positions.

tabula-py is using 72 dpi area coordinates with points, but I have 300 dpi pixels coordinates that I have extracted from a trained ML model.

Is there a way to use my area (with 300 dpi pixels locations) on the tabula-py table extraction with their method (read_pdf or convert_into where you pass an area with 72 dpi points coordinates)

edited Oct 11 '19 at 12:43

theduck

2,589
13
17
23

asked Oct 11 '19 at 09:27

Dach Ch

1

Isn't this a simple application of the rule of three? – mkl Oct 11 '19 at 17:07
@mkl I have used the following conversion: pdfX = pixel* 72 / dpi. When I use it I am not getting the correct measurements. Example pdf_upper_left = image_upper_left * 72/300 – Dach Ch Oct 14 '19 at 12:39
In that case are you sure the only difference between the coordinates is the size of a unit? E.g. the origin might be at completely different locations... – mkl Oct 14 '19 at 14:16
Yes you're right, the coordinates were reversed and new all works fine. Thanks!!! – Dach Ch Oct 17 '19 at 09:23

Tabula-py extract tables by area coordinates pixels with 300 dpi

0 Answers0