TL;DR
I'm working on this solution, have PDF in S3 and can call Textract manually (using boto3 from my own laptop). When called from Lambda (private VPC) I get this error:
An error occurred (InvalidS3ObjectException) when calling the StartDocumentAnalysis operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
About the solution
Lambda is running with a role that has "s3:*" on the buckets.
Lambda is being triggered by a message in RabbitMQ.
Textract and S3 are in the same region.
Previous to Textract call I have a check like this:
s3 = boto3.resource('s3', region_name=AWS_REGION)
try:
# verify object existence
s3.Object(bucket, objectkeyTsubT)
except Exception as e:
raise e
The object is found.
Textract is being called in the same region:
config = Config(retries=dict(max_attempts=30))
boto3.client('textract', region_name=AWS_REGION, config=config)
I understand the role the Lambda is assuming is the one that provides Textract with permission, so is this role which has to have access to the buckets.
I don't fully understand how Textract works with VPC endpoints. The facts are: Textract is being reached by Lambda; S3 input bucket is being read by Lambda.
Do I need to grant other access method from Textract to S3?