Speech Assessment in Java never returning insertions and omissions nor detecting speech end

Question

for a project I am trying to use azure's speech assessment in java to assess how the user pronounces the words and if they announce it well. Currently I am using a approach that uses the listeners:

    System.out.println("Starting recording with " + this.prompt);
        PronunciationAssessmentConfig pronunciationAssessmentConfig = new PronunciationAssessmentConfig(this.getPrompt(),
                PronunciationAssessmentGradingSystem.HundredMark, PronunciationAssessmentGranularity.Phoneme, true);
//        PronunciationAssessmentConfig pronunciationAssessmentConfig = PronunciationAssessmentConfig.fromJson("{\"referenceText\":\"" + getPrompt() + "\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\", \"miscue\":true}");
        AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
        SpeechUtil.SPEECH_CONFIG.setOutputFormat(OutputFormat.Detailed);

        SpeechRecognizer speechRecognizer = new SpeechRecognizer(
                SpeechUtil.SPEECH_CONFIG,
                audioConfig);

        pronunciationAssessmentConfig.applyTo(speechRecognizer);

        speechRecognizer.startContinuousRecognitionAsync();

        speechRecognizer.recognizing.addEventListener((o, speechRecognitionResultEventArgs) -> {
            try {
                final int words = speechRecognitionResultEventArgs.getResult().getText().split(" ").length;
                System.out.println("Recognizing: " + speechRecognitionResultEventArgs.getResult().getText());
                Platform.runLater(() -> this.controller.setSpoken(0, wordsReadToIndex(words)));
                System.out.println(words);
                System.out.println(pronunciationAssessmentConfig.getReferenceText());
                System.out.println(speechRecognitionResultEventArgs.getResult().getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult));

                if (words >= pronunciationAssessmentConfig.getReferenceText().split(" ").length)
                    speechRecognizer.stopContinuousRecognitionAsync();

            } catch (Throwable e) {
                e.printStackTrace();
            }
        });

        speechRecognizer.recognized.addEventListener((o, speechRecognitionEventArgs) -> {
            System.out.println("Recognized!");
            try {
                PronunciationAssessmentResult pronunciationAssessmentResult =
                        PronunciationAssessmentResult.fromResult(speechRecognitionEventArgs.getResult());
                if (pronunciationAssessmentResult == null) return;
                System.out.println(pronunciationAssessmentResult.getAccuracyScore());
                String jsonString = speechRecognitionEventArgs.getResult().getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult);
                System.out.println(jsonString);
                Platform.runLater(() -> this.complete(new Score(pronunciationAssessmentResult.getAccuracyScore().intValue(), JsonParser.parseString(jsonString))));
                speechRecognizer.stopContinuousRecognitionAsync();
            } catch (Throwable e) {
                e.printStackTrace();
            }
        });
//
        speechRecognizer.speechEndDetected.addEventListener((o, speechRecognitionEventArgs) -> {
            System.out.println("Speech end detected!");
        });

however it does not detect any omissions or insertions even though it is turned on in the assessment settings. Neither does it ever detect a speech end.

*Result of saying 'Hello, could I buy one these books you are selling' with reference 'Hello, could I buy one of these books you are selling' * https://pastebin.com/tvAVw4s3

I have tried the following example as well: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechRecognitionSamples.java#L985 (The pronunciationAssessmentWithMicrophoneAsync() function)

but even this example adapted from the microsoft azure github repo never returns anything useful and when it does it is:

CANCELED: ErrorCode=ServiceTimeout
CANCELED: ErrorDetails=Timeout: no recognition result received SessionId: 09f1a5492851429e81e4672c90144a37
CANCELED: Did you update the subscription info?```

**EDIT:**
Ì have found out that it happens because of a noisy environment and pronunciationassessment, though it does return with the exact same code and environment without the pronunciationassessment.

Speech Assessment in Java never returning insertions and omissions nor detecting speech end

0 Answers0