1

I am using AWS Rekognition to detect text from a pdf that is converted into a jpeg. The image that I am using has text that is approximately size 10-12 or a regular letter page. However, The font changes throughout the image several times.

Is my lack of detection and low confidence levels due to having a document where the text changes often? Small Font?

Essentially I'd like to know what kind of image/text do I need to have the best results from a detect text algorithm?

M Waz
  • 755
  • 1
  • 7
  • 18

1 Answers1

3

this is the official documentation snapshot

DetectText API can detect up to 50 words in an image

and to be detected, text must be within +/- 30 degrees orientation of the horizontal axis.

and you are trying to extract a page full of text, that's the problem :)

AWS now provides AWS Textract service that is specifically intended for OCR purposes from images and documents.

Mausam Sharma
  • 852
  • 5
  • 10