0

I try to use the following code to get the word result from the audio using Sphinx, however it can not get the word result, may someone help it?

Here is the wav audio: http://download.wavetlan.com/SVV/Media/HTTP/OtherWAV2.wav

 Configuration configuration = new Configuration();

// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");

StreamSpeechRecognizer recognizer;
try {
    recognizer = new StreamSpeechRecognizer(configuration);

    recognizer.startRecognition(new FileInputStream("1.wav"));
    SpeechResult result = recognizer.getResult();
    recognizer.stopRecognition();


    // Print utterance string without filler words.
    System.out.println(result.getHypothesis());

    System.out.println("================word result=============="+result.getWords().size());
    // Get individual words and their times.
    for (WordResult r : result.getWords()) {
        System.out.println(r);
    }
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

Below is the output for the result:

19:12:30.264 INFO lexTreeLinguist      Max CI Units 43
19:12:30.264 INFO lexTreeLinguist      Unit table size 79507
19:12:30.273 INFO speedTracker         # ----------------------------- Timers----------------------------------------
19:12:30.273 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
19:12:30.273 INFO speedTracker         Compile              1       1.4020s   1.4020s   1.4020s   1.4020s   1.4020s   
19:12:30.273 INFO speedTracker         Load LM              1       0.6420s   0.6420s   0.6420s   0.6420s   0.6420s   
19:12:30.273 INFO speedTracker         Load Dictionary      1       0.0880s   0.0880s   0.0880s   0.0880s   0.0880s   
19:12:30.273 INFO speedTracker         Load AM              1       1.7740s   1.7740s   1.7740s   1.7740s   1.7740s   
19:12:30.294 INFO speedTracker            This  Time Audio: 1.38s  Proc: 0.01s  Speed: 0.00 X real time
19:12:30.295 INFO speedTracker            Total Time Audio: 1.38s  Proc: 0.01s 0.00 X real time
19:12:30.295 INFO memoryTracker           Mem  Total: 840.50 Mb  Free: 584.33 Mb
19:12:30.295 INFO memoryTracker           Used: This: 256.17 Mb  Avg: 256.17 Mb  Max: 256.17 Mb
19:12:30.295 INFO trieNgramModel       LM Cache Size: 0 Hits: 0 Misses: 0
19:12:30.314 INFO speedTracker         # ----------------------------- Timers----------------------------------------
19:12:30.314 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
19:12:30.314 INFO speedTracker         Compile              1       1.4020s   1.4020s   1.4020s   1.4020s   1.4020s   
19:12:30.314 INFO speedTracker         Load LM              1       0.6420s   0.6420s   0.6420s   0.6420s   0.6420s   
19:12:30.314 INFO speedTracker         Load Dictionary      1       0.0880s   0.0880s   0.0880s   0.0880s   0.0880s   
19:12:30.314 INFO speedTracker         Score                2       0.0000s   0.0000s   0.0080s   0.0040s   0.0080s   
19:12:30.315 INFO speedTracker         Prune                5       0.0000s   0.0000s   0.0000s   0.0000s   0.0000s   
19:12:30.315 INFO speedTracker         Grow                 7       0.0000s   0.0000s   0.0040s   0.0007s   0.0050s   
19:12:30.315 INFO speedTracker         Frontend             2       0.0000s   0.0000s   0.0080s   0.0040s   0.0080s   
19:12:30.315 INFO speedTracker         Load AM              1       1.7740s   1.7740s   1.7740s   1.7740s   1.7740s   
19:12:30.315 INFO speedTracker            Total Time Audio: 1.38s  Proc: 0.01s 0.00 X real time
19:12:30.315 INFO memoryTracker           Mem  Total: 840.50 Mb  Free: 584.33 Mb
19:12:30.315 INFO memoryTracker           Used: This: 256.17 Mb  Avg: 256.17 Mb  Max: 256.17 Mb

================word result==============0
barryhunter
  • 20,886
  • 3
  • 30
  • 43

1 Answers1

1

The audio must have the following format:

RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

You audio has this format:

RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 11025 Hz

It can not be decoded with default model. This audio can not be converted to a proper format as well because it has frequency less than 16000 Hz and it is just 8 bits instead of 16 bits. You need to make sure that you convert original audio into proper format before decoding.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87