How to get S3 object from AWS-Textract when called by Lambda in a private Subnet?

Question

TL;DR

I'm working on this solution, have PDF in S3 and can call Textract manually (using boto3 from my own laptop). When called from Lambda (private VPC) I get this error:

An error occurred (InvalidS3ObjectException) when calling the StartDocumentAnalysis operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.

About the solution

Lambda is running with a role that has "s3:*" on the buckets.

Lambda is being triggered by a message in RabbitMQ.

Textract and S3 are in the same region.

Previous to Textract call I have a check like this:

s3 = boto3.resource('s3', region_name=AWS_REGION)
try:
    # verify object existence
    s3.Object(bucket, objectkeyTsubT)
except Exception as e:
    raise e

The object is found.

Textract is being called in the same region:

config = Config(retries=dict(max_attempts=30))
boto3.client('textract', region_name=AWS_REGION, config=config)

I understand the role the Lambda is assuming is the one that provides Textract with permission, so is this role which has to have access to the buckets.

I don't fully understand how Textract works with VPC endpoints. The facts are: Textract is being reached by Lambda; S3 input bucket is being read by Lambda.

Do I need to grant other access method from Textract to S3?

Does the S3 bucket in question have a bucket policy that restricts access to this VPC, or any other Deny policy that could apply? — jarmod, Mar 28 '22 at 21:26
@jarmod thanks for answering... no, no deny policies... I try to avoid these type of policies as much as possible... actually I can read the object from the lambda in the check — JuanMatias, Mar 29 '22 at 12:11

How to get S3 object from AWS-Textract when called by Lambda in a private Subnet?

0 Answers0