for a project I am trying to use azure's speech assessment in java to assess how the user pronounces the words and if they announce it well. Currently I am using a approach that uses the listeners:
System.out.println("Starting recording with " + this.prompt);
PronunciationAssessmentConfig pronunciationAssessmentConfig = new PronunciationAssessmentConfig(this.getPrompt(),
PronunciationAssessmentGradingSystem.HundredMark, PronunciationAssessmentGranularity.Phoneme, true);
// PronunciationAssessmentConfig pronunciationAssessmentConfig = PronunciationAssessmentConfig.fromJson("{\"referenceText\":\"" + getPrompt() + "\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\", \"miscue\":true}");
AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
SpeechUtil.SPEECH_CONFIG.setOutputFormat(OutputFormat.Detailed);
SpeechRecognizer speechRecognizer = new SpeechRecognizer(
SpeechUtil.SPEECH_CONFIG,
audioConfig);
pronunciationAssessmentConfig.applyTo(speechRecognizer);
speechRecognizer.startContinuousRecognitionAsync();
speechRecognizer.recognizing.addEventListener((o, speechRecognitionResultEventArgs) -> {
try {
final int words = speechRecognitionResultEventArgs.getResult().getText().split(" ").length;
System.out.println("Recognizing: " + speechRecognitionResultEventArgs.getResult().getText());
Platform.runLater(() -> this.controller.setSpoken(0, wordsReadToIndex(words)));
System.out.println(words);
System.out.println(pronunciationAssessmentConfig.getReferenceText());
System.out.println(speechRecognitionResultEventArgs.getResult().getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult));
if (words >= pronunciationAssessmentConfig.getReferenceText().split(" ").length)
speechRecognizer.stopContinuousRecognitionAsync();
} catch (Throwable e) {
e.printStackTrace();
}
});
speechRecognizer.recognized.addEventListener((o, speechRecognitionEventArgs) -> {
System.out.println("Recognized!");
try {
PronunciationAssessmentResult pronunciationAssessmentResult =
PronunciationAssessmentResult.fromResult(speechRecognitionEventArgs.getResult());
if (pronunciationAssessmentResult == null) return;
System.out.println(pronunciationAssessmentResult.getAccuracyScore());
String jsonString = speechRecognitionEventArgs.getResult().getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult);
System.out.println(jsonString);
Platform.runLater(() -> this.complete(new Score(pronunciationAssessmentResult.getAccuracyScore().intValue(), JsonParser.parseString(jsonString))));
speechRecognizer.stopContinuousRecognitionAsync();
} catch (Throwable e) {
e.printStackTrace();
}
});
//
speechRecognizer.speechEndDetected.addEventListener((o, speechRecognitionEventArgs) -> {
System.out.println("Speech end detected!");
});
however it does not detect any omissions or insertions even though it is turned on in the assessment settings. Neither does it ever detect a speech end.
*Result of saying 'Hello, could I buy one these books you are selling' with reference 'Hello, could I buy one of these books you are selling' * https://pastebin.com/tvAVw4s3
I have tried the following example as well: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechRecognitionSamples.java#L985 (The pronunciationAssessmentWithMicrophoneAsync() function)
but even this example adapted from the microsoft azure github repo never returns anything useful and when it does it is:
CANCELED: ErrorCode=ServiceTimeout
CANCELED: ErrorDetails=Timeout: no recognition result received SessionId: 09f1a5492851429e81e4672c90144a37
CANCELED: Did you update the subscription info?```
**EDIT:**
Ì have found out that it happens because of a noisy environment and pronunciationassessment, though it does return with the exact same code and environment without the pronunciationassessment.