A quote from tutorial:
Developer can configure several “search” objects with different grammars and language models and switch them in runtime to provide interactive experience for the user.
There are different possible search modes:
- keyword - efficiently looks for keyphrase and ignores other speech.
allows to configure detection threshold
- grammar - recognizes speech
according to JSGF grammar. Unlike keyphrase grammar search doesn't
ignore words which are not in grammar but tries to recognize them.
- ngram/lm - recognizes natural speech with a language model.
- allphone - recognizes phonemes with a phonetic language model.
Each search has a name and can be referenced by a name, names are application-specific. The function ps_set_search
allows to activate the search previously added by a name.
To add the search one needs to point to the grammar/language model describing the search. The location of the grammar is specific to the application. If only a simple recognition is required it is sufficient to add a single search or just configure the required mode with configuration options.
The exact design of a searches depends on your application. For example, you might want to listen for activation keyword first and once keyword is recognized switch to ngram search to recognize actual command. Once you recognized the command you can switch to grammar search to recognize the confirmation and then switch back to keyword listening mode to wait for another command.
The code to switch searches in Python looks like this:
# Init decoder
config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
config.set_string('-dict', path.join(MODELDIR, 'en-us/cmudict-en-us.dict'))
decoder = Decoder(config)
# Add searches
decoder.set_kws('keyword', 'keyword.list')
decoder.set_lm_file('lm', 'query.lm')
decoder.set_search('keyword')
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()
in_speech_bf = False
decoder.start_utt()
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
if decoder.get_in_speech() != in_speech_bf:
in_speech_bf = decoder.get_in_speech()
if not in_speech_bf:
decoder.end_utt()
# Print hypothesis and switch search to another mode
print 'Result:', decoder.hyp().hypstr
if decoder.get_search() == 'keyword':
decoder.set_search('lm')
else:
decoder.set_search('keyword')
decoder.start_utt()
else:
break
decoder.end_utt()