0

Textract detect_document_text method from boto3 package is erroring out in my local machine only on images of a certain size. It responds with a super cryptic message, so I am at a loss how to debug this further. The documentation doesn't indicate that there are any other parameters that I can pass through.

I am attempting to pass the image bytes directly to the method call

client.detect_document_text(
  Document={
    'Bytes': img,
  }
)

Variable img is a bytes array img = b'...' and of size len(img) == 22023165. The image is 2.4 MB.

Jeremy
  • 1,717
  • 5
  • 30
  • 49
  • 22023165 is not 2.4 MB, it's more like 22 MB. – Jim Foye Jan 12 '22 at 12:11
  • Right. When I save the image to a file, it is only 2.4 MB. Maybe this is something poppler is doing (pdf2image) to inflate the image after being converted to a PDF – Jeremy Jan 13 '22 at 03:04

0 Answers0