On my system, using my USB microphone, I've found that the audio level that works best with CMU Sphinx is about 20% of the maximum. This gives me 75% voice recognition accuracy. If I amplify this digitally I get far worse recognition accuracy (25%). Why is this? What is the recommended audio level for Sphinx? [Also I am using 16,000 samples/sec, 16-bit.]
Asked
Active
Viewed 84 times
1 Answers
0
pocketsphinx decoder uses channel amplitude normalization. Initial normalization value is configured to 20% audio level indeed inside the model (-cmninit parameter in feat.params). However, the level is updated as you decode, so it has only effect on first utterance. If you properly decode in continuous mode, level should not matter. Do not restart recognizer for every utterance, let it adapt to the noise and audio level.

Nikolay Shmyrev
- 24,897
- 5
- 43
- 87