I'm trying to use Amazon Textract to perform OCR to build a small application. I'm trying to find a way to get the character co-ordinates from each word.
Is there any way I can find the character level coordinates/character data?
I'm trying to use Amazon Textract to perform OCR to build a small application. I'm trying to find a way to get the character co-ordinates from each word.
Is there any way I can find the character level coordinates/character data?
For each 'word', yes there is. The documentation specifies how:
Using Amazon Textract: Item Location on a Document Page
https://docs.aws.amazon.com/textract/latest/dg/text-location.html
Amazon Textract operations return the location and geometry of items found on a document page. DetectDocumentText and GetDocumentTextDetection return the location and geometry for lines and words, while AnalyzeDocument and GetDocumentAnalysis return the location and geometry of key-value pairs, tables, cells, and selection elements.
To determine where an item is on a document page, use the bounding box (Geometry) information that's returned by the Amazon Textract operation in a Block object. The Geometry object contains two types of location and geometric information for detected items:
An axis-aligned BoundingBox object that contains the top-left coordinate and the width and height of the item.
A polygon object that describes the outline of the item, specified as an array of Point objects that contain X (horizontal axis) and Y (vertical axis) document page coordinates of each point.
You can use geometry information to draw bounding boxes around detected items. For an example that uses BoundingBox and Polygon information to draw boxes around lines and vertical lines at the start and end of each word, see Detecting Document Text with Amazon Textract. The example output is similar to the following.
Bounding Box A bounding box (BoundingBox) has the following properties:
Height – The height of the bounding box as a ratio of the overall document page height.
Left – The X coordinate of the top-left point of the bounding box as a ratio of the overall document page width.
Top – The Y coordinate of the top-left point of the bounding box as a ratio of the overall document page height.
Width – The width of the bounding box as a ratio of the overall document page width.
Each BoundingBox property has a value between 0 and 1. The value is a ratio of the overall image width (applies to Left and Width) or height (applies to Height and Top). For example, if the input image is 700 x 200 pixels, and the top-left coordinate of the bounding box is (350,50) pixels, the API returns a Left value of 0.5 (350/700) and a Top value of 0.25 (50/200).