How can I use AWS Textract with Python

Question

I have tested almost every example code I can find on the Internet for Amazon Textract and I cant get it to work. I can upload and download a file to S3 from my Python client so the credentials should be OK. Lots of the errors points to some region failure but I have try every possible combinations.

Here are one of the last test call -

def test_parse_3():
# Document
s3BucketName = "xx-xxxx-xx"
documentName = "xxxx.jpg"

# Amazon Textract client
textract = boto3.client('textract')

# Call Amazon Textract
response = textract.detect_document_text(
    Document={
        'S3Object': {
            'Bucket': s3BucketName,
            'Name': documentName
        }
    })

print(response)

seems to be pretty easy but it generates the error -

botocore.errorfactory.InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the DetectDocumentText operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.

Any ideas whats wrong and dose someone have a working example (I knew the tabs are not correct in the example code)?

I have also tested a lot of permission settings in AWS. The credentials are in a hidden files created by aws sdk.

What region is the bucket you are using in? – Ninad Gaikwad Aug 19 '20 at 15:17 — Ninad Gaikwad, Aug 19 '20 at 15:17

score 0 · Answer 1 · answered Jan 06 '21 at 19:48

I am sure you already know, but the bucket is case sensitive. If you have verified that both the object bucket and name are correct, just make sure to add the appropriate region to your credentials.

I tested just reading from s3 without including the region in the credentials and I was able to list the objects in the bucket with no issues. I am thinking this worked because s3 is supposed to be region agnostic. However, since Textract is region specific, you must define the region in your credentials when using Textract to get the data from the s3 bucket.

I realize this was asked a few months ago, but I am hoping this sheds some light to others that face this issue in the future.

How can I use AWS Textract with Python

1 Answers1