3

I'm trying out Amazon Transcribe on a collection of media files, adapting the sample docs code and using this series as a reference to fit with any upload to my designated media S3 folder, but having issues with my test file.

UPLOAD BUCKET/FOLDER path:

'MediaFileUri': https://us-west-2.console.aws.amazon.com/s3/buckets/upload-asr/mediaupload/file.mp4

I've verified that the file exists and the bucket permissions grant access to the Amazon Transcribe service. I am able to start a manual transcription job with the same URL, but not with the SDK: I've also directly linked it in the function using the path above with no result. I appreciate it might be a URL path issue, but haven't seen much on the subject so checking for an obvious error.

import json
import time
import boto3
from urllib.request import urlopen


def lambda_handler(event, context):
    transcribe = boto3.client("transcribe")
    s3 = boto3.client("s3")

    if event:
        file_obj = event["Records"][0]
        bucket_name = str(file_obj['s3']['bucket']['name'])
        file_name = str(file_obj['s3']['object']['key'])
        file_type = file_name.split(".")[1]
        s3_uri = create_uri(bucket_name, file_name)
        job_name = context.aws_request_id


        transcribe.start_transcription_job(TranscriptionJobName = job_name,
                                            Media = {'MediaFileUri': s3_uri},
                                            OutputBucketName = "bucket-name",
                                            MediaFormat = file_type,
                                            LanguageCode = "en-US")

def create_uri(bucket_name, file_name):

CloudWatch Log Failure Report:

[ERROR] BadRequestException: An error occurred (BadRequestException) when calling the StartTranscriptionJob operation: 
The URI that you provided doesn't point to an S3 object. Make sure that the object exists and try your request again.

Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 25, in lambda_handler
    LanguageCode = "en-US")
  File "/var/runtime/botocore/client.py", line 320, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/runtime/botocore/client.py", line 623, in _make_api_call
    raise error_class(parsed_response, operation_name) 

SIMILAR: https://forums.aws.amazon.com/thread.jspa?messageID=876906&#876906

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
asdwasow
  • 67
  • 1
  • 3
  • 6
  • 1
    Well, the error says `The URI that you provided doesn't point to an S3 object`. What is the _exact_ contents of the URI? Is the S3 bucket in the same region as the Transcribe function you are calling? It appears that you are using a bucket in us-west-2, but the `boto3.client("transcribe")` line is not specifying a particular region. (It might be defaulting to the right region, but it is worth checking!) – John Rotenstein Jul 04 '19 at 00:51
  • `s3_uri = "s3://upload-asr/mediaupload/file.mp3”` The exact content is above and what I shared in my original post; I kept the MediaFormat saved as a `file_type = file_name.split(“.”)[1]` variable initially before specifying an mp3/4 each time, but that was the only discrepancy. I've set `transcribe = boto3.client("transcribe", region_name='us-west-2')` like so, but it doesn't seem to have an effect either way. Uploads to this specific folder in any format should trigger the API, but there's no sign the code above is working like it does in the examples. https://amzn.to/2XUkVuc – asdwasow Jul 08 '19 at 01:16

1 Answers1

4

It works for me using this format:

Media={
    'MediaFileUri': f'https://s3-us-west-2.amazonaws.com/{BUCKET}/{KEY}'
},
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470