0

I am using Amazon Textract for Text detecting or Raw text, forms and Tables.

I am uploading a PDF for that.

I am using co-ordinates to get the value from raw text. I was successful in getting the value. But after some days, the bounding box co-ordinates for that particular block changed. Then, my logic was not working.

Do you guys have any idea why those co-ordinates are changing?

This is how I have applied my logic after identifying the co-ordinates.

if ((item.Geometry.BoundingBox.Top >= 0.92379182 && item.Geometry.BoundingBox.Top <= 0.96)
        && (item.Geometry.BoundingBox.Left >= 0.02470588 && item.Geometry.BoundingBox.Left <= 0.29)
        && (item.Geometry.BoundingBox.Height >= 0.001 && item.Geometry.BoundingBox.Height <= 0.054545)
        && (item.Geometry.BoundingBox.Width >= 0.001 && item.Geometry.BoundingBox.Width <= 0.16))
            {
                text = text + " " + item.Text;
            }
OCR
  • 1
  • 1

1 Answers1

1

The machine learning models behind Textract are subject to change. Although there are regression tests in place to ensure the overall quality doesn't get worse, that's not to say the results won't ever change. Especially things like bounding boxes, as long as the region of interests are still bounded correctly, slightly change in coordinates may not be considered as a regression.

leetcode269
  • 391
  • 4
  • 15