I'm using PocketSphinx on Android. After the recognizer initializes, I start a keyword listener. At first, the recognizer will not match anything. But, after a few seconds, the recognizer starts matching keywords with excellent performance (about a 3% WER in initial testing). The time it takes to start matching depends on the word/phrase. It also seems to depend on how many times you say the word. For instance, "plus" is matched very quickly, usually on the first or second utterance, taking an average of 2 seconds to match. "A little help please", on the other hand takes around 10 seconds, or about 8-10 utterances. Once any keyword is matched, Sphinx enters its high-performance mode for all keywords. So, one workaround (although not a very good one) is to say "plus" immediately after initialization completes. During the time that no matching occurs, onBeginningOfSpeech() and onEndOfSpeech() are called by Sphinx, corresponding to the utterances of the key phrase or keyword.
Keyword file:
cancel last
a little help please
add new cut/1e-35/
set material
set quantity
plus/5e-2/
minus/5e-2/
I'm using pocketsphinx-android-5prealpha-nolib.jar, and (if it makes a difference) have tested on a Samsung Galaxy-S3 and a Motorola Moto E (2nd Gen). The problem is the same whether or not I use a headset.