0

My setup is the following:

React-native app client -> AWS API Gateway -> AWS Lambda function -> AWS S3 -> AWS Transcribe -> AWS S3

I am successfully able to upload an audio file to an S3 bucket from the lambda, start the transcription and even access it manually in the S3 bucket. However when I try to access the json file with the transcription data using TranscriptFileUri I am getting 403 response.

On the s3 bucket with the transcriptions I have the following CORS configuration:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "PUT",
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [
            "ETag"
        ]
    }
]

My lambda function code looks like this:

response = client.start_transcription_job(
        TranscriptionJobName=jobName,
        LanguageCode='en-US',
        MediaFormat='mp4',
        Media={
            'MediaFileUri': s3Path
        },        
        OutputBucketName = 'my-transcription-bucket',
        OutputKey = str(user_id) + '/'
    )
    
    while True:
        result = client.get_transcription_job(TranscriptionJobName=jobName)
        if result['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
        time.sleep(5)
        
    if result['TranscriptionJob']['TranscriptionJobStatus'] == "COMPLETED":
        data = result['TranscriptionJob']['Transcript']['TranscriptFileUri']
        data = requests.get(data)
        print(data)

In Cloudwatch I get the following: <Response [403]> when printing the response.

Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
Kronax
  • 69
  • 7
  • 1
    Note: the better way to 'wait' for the transcription job to complete is not to wait at all, especially not in a Lambda function. Instead, configure a [notification event](https://docs.aws.amazon.com/transcribe/latest/dg/monitoring-events.html) and have that trigger another Lambda function. – jarmod Nov 10 '22 at 14:15

1 Answers1

2

As far as I can tell, your code is invoking requests.get(data) where data is the TranscriptFileUri. What does that URI look like? Is it signed? If not, as I suspect, then you cannot use requests to get the file from S3 (it would have to be a signed URL or a public object for this to work).

You should use an authenticated mechanism such as get_object.

jarmod
  • 71,565
  • 16
  • 115
  • 122
  • 1
    That was it, thank you you're a genius! This is how I eventually got the transcript: `if result['TranscriptionJob']['TranscriptionJobStatus'] == "COMPLETED": transcriptionObject = s3.get_object(Bucket=transcriptionBucketName, Key=transcriptionKey) transcriptionJson = json.loads(transcriptionObject['Body'].read().decode('utf-8')) transcript = transcriptionJson['results']['transcripts'][0]['transcript'] print(transcript)` – Kronax Nov 10 '22 at 15:13
  • Sorry but was not able to format my code in my previous comment properly. – Kronax Nov 10 '22 at 15:20