0

I'm trying to use Amazon Textract to perform OCR to build a small application. I'm trying to find a way to get the character co-ordinates from each word.

Is there any way I can find the character level coordinates/character data?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
N00b_Geek
  • 33
  • 7

1 Answers1

0

For each 'word', yes there is. The documentation specifies how:

Using Amazon Textract: Item Location on a Document Page

https://docs.aws.amazon.com/textract/latest/dg/text-location.html

Amazon Textract operations return the location and geometry of items found on a document page. DetectDocumentText and GetDocumentTextDetection return the location and geometry for lines and words, while AnalyzeDocument and GetDocumentAnalysis return the location and geometry of key-value pairs, tables, cells, and selection elements.

To determine where an item is on a document page, use the bounding box (Geometry) information that's returned by the Amazon Textract operation in a Block object. The Geometry object contains two types of location and geometric information for detected items:

An axis-aligned BoundingBox object that contains the top-left coordinate and the width and height of the item.

A polygon object that describes the outline of the item, specified as an array of Point objects that contain X (horizontal axis) and Y (vertical axis) document page coordinates of each point.

You can use geometry information to draw bounding boxes around detected items. For an example that uses BoundingBox and Polygon information to draw boxes around lines and vertical lines at the start and end of each word, see Detecting Document Text with Amazon Textract. The example output is similar to the following.

Detect Document Text

Bounding Box A bounding box (BoundingBox) has the following properties:

Height – The height of the bounding box as a ratio of the overall document page height.

Left – The X coordinate of the top-left point of the bounding box as a ratio of the overall document page width.

Top – The Y coordinate of the top-left point of the bounding box as a ratio of the overall document page height.

Width – The width of the bounding box as a ratio of the overall document page width.

Each BoundingBox property has a value between 0 and 1. The value is a ratio of the overall image width (applies to Left and Width) or height (applies to Height and Top). For example, if the input image is 700 x 200 pixels, and the top-left coordinate of the bounding box is (350,50) pixels, the API returns a Left value of 0.5 (350/700) and a Top value of 0.25 (50/200).

GoodJuJu
  • 1,296
  • 2
  • 16
  • 37
  • I know about the Words, I'm asking about the characters. Your answer is exactly the documentation says. But, anyways thanks. – N00b_Geek Feb 04 '21 at 15:21
  • Unfortunately character level coordinates are not currently supported (as of Feb 2021). https://forums.aws.amazon.com/message.jspa?messageID=970443#970443 From AWS Support `Thanks for using AWS Textract. Currently we don't support providing character coordinates. We will take your feedback and evaluate the feasibility.` – GoodJuJu Feb 05 '21 at 00:17