I am working on a game for iPhone that is fully usable by providing YES / NO responses.
It would be great to make this game available to blind users, runners, and people driving cars by allowing voice control. This does not require full speech recognition, I am looking to implement keyword spotting.
I can already detect start and stop of utterances, and have implemented this at https://github.com/fulldecent/FDSoundActivatedRecorder The next step is to distinguish between YES and NO responses reliably for a wide variety of users.
THE QUESTION: For reasonable performance (distinguish YES / NO / STOP within 0.5 sec after speech stops), is AVAudioRecorder
a reasonable choice? Is there a published algorithm that meets these needs?