0

I am using AWS Transcribe for speech recognition. Though I have created my custom vocabulary, I am unable to find any Boto3 code snippet to utilize the it in python. Kindly find the sample code attached.

client_transcribe = boto3.client('transcribe') client_transcribe.start_transcription_job(TranscriptionJobName=job_name, Media={'MediaFileUri': file_url}, MediaFormat='mp4',LanguageCode='en-US', OutputBucketName=bucket)

1 Answers1

1

The vocabulary name is a member of the settings object, a parameter to the start_transcription_job method.

Reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe.html#TranscribeService.Client.start_transcription_job

Example:

settings = {
    'VocabularyName': 'your-custom-vocabulary-name-goes-here'
}

client_transcribe.start_transcription_job(
    TranscriptionJobName=job_name,
    LanguageCode='your-language-code-goes-here',
    Settings=settings,
    MediaFormat='mp4',
    OutputBucketName=bucket
    Media={
        'MediaFileUri': file_url
    })

If you need help to determine the language code of your vocabulary, you can use the following AWS cli command from your terminal if you have AWS cli installed:

aws transcribe get-vocabulary --vocabulary-name {your-custom-vocabulary-name}

It returns a response such as:

{
  "LastModifiedTime": 1573523589.419,
  "VocabularyName": "redacted",
  "DownloadUri": "redacted",
  "LanguageCode": "en-US",
  "VocabularyState": "READY"
}

For example, if the language code for your vocabulary is en-US, then use that language code when calling start_transcription_job.

Hope this helps!

  • Thankyou, I tested that and came across Bad request error- "when calling the StartTranscriptionJob operation: The language used in the specified vocabulary doesn't match the specified language code." when LanguageCode='en-IN' instead of 'en-US' – Gaurav Jaglan Dec 12 '19 at 08:15
  • I generalized my response to indicate that you should use the language code of the vocabulary when invoking `start_transcription_job`. Try using `en-IN` in place of `en-US`. If this helps, please feel free to mark as answered. If you still have questions, let me know. – Joel Van Hollebeke Dec 13 '19 at 21:55
  • 1
    I put an AWS Transcribe example on GitHub that shows how to [create a custom vocabulary](https://github.com/awsdocs/aws-doc-sdk-examples/blob/6ff2068156ee2224c620bdef8941aadefad713fb/python/example_code/transcribe/transcribe_basics.py#L160) and [use it in a job](https://github.com/awsdocs/aws-doc-sdk-examples/blob/6ff2068156ee2224c620bdef8941aadefad713fb/python/example_code/transcribe/transcribe_basics.py#L160). The code is similar to this answer, but it might help you to see it in context. – Laren Crawford Sep 14 '20 at 18:51
  • I created a vocabulary from the console and when I run aws transcribe get-vocabulary --vocabulary-name {your-custom-vocabulary-name} (replaced {vocab-name} with my vocab name it gives me vocab name not found error – Plasmatiger Apr 19 '21 at 11:01
  • 1
    Hi @Plasmatiger, what do you get when you try the following command? aws transcribe list-vocabularies – Joel Van Hollebeke Apr 19 '21 at 18:40
  • Hi Joel, Sorry for the late reply. I figured out the issue that my env code was also taking the AWS creds embedded into my system. So, I spun up new server and it was working fine :) Thanks for your prompt reply. – Plasmatiger Apr 24 '21 at 04:06