1

for a project I am trying to use azure's speech assessment in java to assess how the user pronounces the words and if they announce it well. Currently I am using a approach that uses the listeners:

    System.out.println("Starting recording with " + this.prompt);
        PronunciationAssessmentConfig pronunciationAssessmentConfig = new PronunciationAssessmentConfig(this.getPrompt(),
                PronunciationAssessmentGradingSystem.HundredMark, PronunciationAssessmentGranularity.Phoneme, true);
//        PronunciationAssessmentConfig pronunciationAssessmentConfig = PronunciationAssessmentConfig.fromJson("{\"referenceText\":\"" + getPrompt() + "\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\", \"miscue\":true}");
        AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
        SpeechUtil.SPEECH_CONFIG.setOutputFormat(OutputFormat.Detailed);

        SpeechRecognizer speechRecognizer = new SpeechRecognizer(
                SpeechUtil.SPEECH_CONFIG,
                audioConfig);

        pronunciationAssessmentConfig.applyTo(speechRecognizer);

        speechRecognizer.startContinuousRecognitionAsync();

        speechRecognizer.recognizing.addEventListener((o, speechRecognitionResultEventArgs) -> {
            try {
                final int words = speechRecognitionResultEventArgs.getResult().getText().split(" ").length;
                System.out.println("Recognizing: " + speechRecognitionResultEventArgs.getResult().getText());
                Platform.runLater(() -> this.controller.setSpoken(0, wordsReadToIndex(words)));
                System.out.println(words);
                System.out.println(pronunciationAssessmentConfig.getReferenceText());
                System.out.println(speechRecognitionResultEventArgs.getResult().getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult));

                if (words >= pronunciationAssessmentConfig.getReferenceText().split(" ").length)
                    speechRecognizer.stopContinuousRecognitionAsync();

            } catch (Throwable e) {
                e.printStackTrace();
            }
        });

        speechRecognizer.recognized.addEventListener((o, speechRecognitionEventArgs) -> {
            System.out.println("Recognized!");
            try {
                PronunciationAssessmentResult pronunciationAssessmentResult =
                        PronunciationAssessmentResult.fromResult(speechRecognitionEventArgs.getResult());
                if (pronunciationAssessmentResult == null) return;
                System.out.println(pronunciationAssessmentResult.getAccuracyScore());
                String jsonString = speechRecognitionEventArgs.getResult().getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult);
                System.out.println(jsonString);
                Platform.runLater(() -> this.complete(new Score(pronunciationAssessmentResult.getAccuracyScore().intValue(), JsonParser.parseString(jsonString))));
                speechRecognizer.stopContinuousRecognitionAsync();
            } catch (Throwable e) {
                e.printStackTrace();
            }
        });
//
        speechRecognizer.speechEndDetected.addEventListener((o, speechRecognitionEventArgs) -> {
            System.out.println("Speech end detected!");
        });

however it does not detect any omissions or insertions even though it is turned on in the assessment settings. Neither does it ever detect a speech end.

*Result of saying 'Hello, could I buy one these books you are selling' with reference 'Hello, could I buy one of these books you are selling' * https://pastebin.com/tvAVw4s3

I have tried the following example as well: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechRecognitionSamples.java#L985 (The pronunciationAssessmentWithMicrophoneAsync() function)

but even this example adapted from the microsoft azure github repo never returns anything useful and when it does it is:

CANCELED: ErrorCode=ServiceTimeout
CANCELED: ErrorDetails=Timeout: no recognition result received SessionId: 09f1a5492851429e81e4672c90144a37
CANCELED: Did you update the subscription info?```

**EDIT:**
Ì have found out that it happens because of a noisy environment and pronunciationassessment, though it does return with the exact same code and environment without the pronunciationassessment.
rowan-vr
  • 136
  • 11

0 Answers0