There are two options provided by AWS Transcribe to create a custom vocabulary. For more info Custom Vocabularies
- Using List
- Using Table
I can create custom vocabularies in both ways via AWS console but when it comes to AWS Java SDK, I can create it using a list. In the case of "using table" it gives me an error
Failure reason
The vocabulary that you’re trying to create contains invalid characters or incorrectly formatted terms. See the developer guide for more information.
AmazonTranscribe transcribe = AmazonTranscribeClient.builder().build();
CreateVocabularyRequest vocabularyRequest = new CreateVocabularyRequest();
vocabularyRequest.setLanguageCode(LanguageCode.EnUS.toString());
vocabularyRequest.setPhrases(Arrays.asList("Phrase\tIPA\tSoundsLike\tDisplayAs", "helloooo\t\thello\thailo"));
vocabularyRequest.setVocabularyName("table-clone");
CreateVocabularyResult vocabularyResult = transcribe.createVocabulary(vocabularyRequest);
But I can create the same vocab using table (via AWS console) so I don't think that there is an issue with my vocab.
Case 1: Via AWS Console
One more important thing to notice is that when we create vocab using list view, AWS appends an end delimiter (ENDOFDICTIONARYTRANSCRIBE). But it doesn't append this delimiter when we create vocab using table view
Case 2: Via AWS Java SDK
End delimiter is appended at the end of the file in both cases (list and table). I think this can be the issue.
To Sum Up
I want to create custom vocabulary using table via AWS Java SDK. I can create the same via AWS Console but failed to do so via Java SDK.