0

I'm creating a small application that requires a live feed of phonemes to be output as the user speaks into their microphone. In my case, the speed of the recognition output is the number 1 priority, even over accuracy. Using C# is the preference, but if a better speed can be accomplished using a different language and/or library (Like CMUSphinx), I would switch.

Using System.Speech.Recognition, along with DictationGrammar("grammar:dictation#pronunciation"), I've been able to create a simple and effective phoneme recognizer that does output phonemes as you speak into the mic, with generally impressive accuracy (subscribing to the SpeechRecognitionEngine.SpeechHypothesized event allows me to see live output). The problem is, it has a minimum delay of around .5s between the user speaking and the output which is too much to work well with the project. I know that in general this is a fairly high speed, especially considering the good accuracy, but I really need something faster, even if the accuracy takes a big hit. Is there any way to configure a SpeechRecognitionEngine to throw accuracy out the window in order to spew out hypothesis faster? I found some exposed settings using SpeechRecognitionEngine.UpdateRecognizerSetting, but they seem to have little effect on the output for phoneme recognition.

I've also looked into CMUSphinx, a free speech recognition project that looked promising. Sphinx4 was easy to compile and set up a test is Java, but I couldn't figure out how to configure it to live output phonemes, and it's word recognition was relatively slow. Here, I found some notes about phoneme recognition using their other project, pocketsphinx. I was able to also download and compile it, but unable to run any tests successfully. Has anyone use CMUSphinx or Pocketsphinx with phonemes? Is it capable of high, live output speeds? Or perhaps there is even more alternatives? I really am looking for something extremely basic, but fast.

Edit: Was able to get pocketsphinx recognizing phonemes, but it was too slow to use in the project

bk999
  • 25
  • 6
  • Add `-allphone_ci yes` to pocketsphinx, it will be fast. Overall, phoneme recognition is never a great idea. – Nikolay Shmyrev Apr 13 '19 at 19:09
  • @NikolayShmyrev thanks, that did greatly improve the speed! It might be fast enough now, however it still waits until there is a period of silence before outputting the results so I can't tell for sure. Is there a way to get a "live stream" of output as it happens? So each individual phoneme is output the moment it is hypothesized? To test, I compiled pocketsphinx and am just running it via a shell with this: ```pocketsphinx_continuous.exe -inmic yes -hmm model/en-us/en-us -allphone_ci yes -allphone en-us-phone.lm.bin -backtrace yes -beam 1e-20 -pbeam 1e-20 -lw 2.0 ``` – bk999 Apr 14 '19 at 18:40

0 Answers0