0

I am using tabula-py to extract tables from pdf by providing an exact area, that hold their positions.

tabula-py is using 72 dpi area coordinates with points, but I have 300 dpi pixels coordinates that I have extracted from a trained ML model.

Is there a way to use my area (with 300 dpi pixels locations) on the tabula-py table extraction with their method (read_pdf or convert_into where you pass an area with 72 dpi points coordinates)

theduck
  • 2,589
  • 13
  • 17
  • 23
Dach Ch
  • 23
  • 1
  • 8
  • 1
    Isn't this a simple application of the rule of three? – mkl Oct 11 '19 at 17:07
  • @mkl I have used the following conversion: pdfX = pixel* 72 / dpi. When I use it I am not getting the correct measurements. Example pdf_upper_left = image_upper_left * 72/300 – Dach Ch Oct 14 '19 at 12:39
  • In that case are you sure the only difference between the coordinates is the size of a unit? E.g. the origin might be at completely different locations... – mkl Oct 14 '19 at 14:16
  • Yes you're right, the coordinates were reversed and new all works fine. Thanks!!! – Dach Ch Oct 17 '19 at 09:23

0 Answers0