1

I intend to use the sphinx4 to translate voice to text. I've been reading a few tutorials and reviews to improve the accuracy and I'm using the following adaptations:

The use of generic acoustic model and generic language model is due to the fact of not knowing what are the words that will be said.

And I'm using the following code:

import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;


public class Example {

    private static final String ACOUSTIC_MODEL =
        "file:/Users/Jimo/Testing/models/acoustic/acoustic_model_us";
    private static final String DICTIONARY_PATH =
        "file:/Users/Jimo/Testing/models/acoustic/acoustic_model_us/dict/cmudict.0.6d";
    private static final String LANGUAGE_MODEL =
        "file:/Users/Jimo/Testing/models/language/en-us.lm.dmp";



    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        configuration.setAcousticModelPath(ACOUSTIC_MODEL);
        configuration.setDictionaryPath(DICTIONARY_PATH);
        configuration.setLanguageModelPath(LANGUAGE_MODEL);

        LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
        // Start recognition process pruning previously cached data.
        recognizer.startRecognition(true);
        SpeechResult result = recognizer.getResult();
        System.out.println(result.getHypothesis());
        // Pause recognition process. It can be resumed then with startRecognition(false).
        recognizer.stopRecognition();


    }
}

What happens is that the response is slow, probably due to the size of the language model, and I can almost never get the desired result. For example, if I say "Hello" the output will be "Oh."

I'm doing something wrong? I know that to improve the accuracy I must have a specific model language, but this way does not become practical to use this application.

David Foerster
  • 1,461
  • 1
  • 14
  • 23

0 Answers0