3

I'm using pocketsphinx with raspberry pi for home automation. I've written a simple JSGF grammar file with the supported commands. Now, I want to use an activation phrase such as "hey computer" prior to the commands, to avoid false detections and only perform speech recognition once the activation phrase has been spoken.

If I'm not getting this wrong, pocketsphinx supports two modes for speech recognition: keyword spotting mode, and language model / JSGF grammar mode.

In pocketsphinx FAQ when addressing the issue of how to reject out-of-grammar words, it says:

If you want to recognize several commands, you can use keyword spotting mode or keyword activation mode combined with the switch to grammar to perform actual operation.

My question is, how exactly is this "switching" from keyword spotting mode to grammar mode implemented? (what should I do to achieve it?). Related to that, what's the difference between "keyword spotting mode" and "keyword activation mode"?

Thanks!

jotadepicas
  • 2,389
  • 2
  • 26
  • 48

1 Answers1

5

A quote from tutorial:

Developer can configure several “search” objects with different grammars and language models and switch them in runtime to provide interactive experience for the user.

There are different possible search modes:

  • keyword - efficiently looks for keyphrase and ignores other speech. allows to configure detection threshold
  • grammar - recognizes speech according to JSGF grammar. Unlike keyphrase grammar search doesn't ignore words which are not in grammar but tries to recognize them.
  • ngram/lm - recognizes natural speech with a language model.
  • allphone - recognizes phonemes with a phonetic language model.

Each search has a name and can be referenced by a name, names are application-specific. The function ps_set_search allows to activate the search previously added by a name.

To add the search one needs to point to the grammar/language model describing the search. The location of the grammar is specific to the application. If only a simple recognition is required it is sufficient to add a single search or just configure the required mode with configuration options.

The exact design of a searches depends on your application. For example, you might want to listen for activation keyword first and once keyword is recognized switch to ngram search to recognize actual command. Once you recognized the command you can switch to grammar search to recognize the confirmation and then switch back to keyword listening mode to wait for another command.

The code to switch searches in Python looks like this:

# Init decoder
config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
config.set_string('-dict', path.join(MODELDIR, 'en-us/cmudict-en-us.dict'))
decoder = Decoder(config)

# Add searches
decoder.set_kws('keyword', 'keyword.list')
decoder.set_lm_file('lm', 'query.lm')
decoder.set_search('keyword')

import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()

in_speech_bf = False
decoder.start_utt()
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
        if decoder.get_in_speech() != in_speech_bf:
            in_speech_bf = decoder.get_in_speech()
            if not in_speech_bf:
                decoder.end_utt()

                # Print hypothesis and switch search to another mode
                print 'Result:', decoder.hyp().hypstr

                if decoder.get_search() == 'keyword':
                     decoder.set_search('lm')
                else:
                     decoder.set_search('keyword')

                decoder.start_utt()
    else:
        break
decoder.end_utt()
Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Thank you Nikolay! I'll try that. You know, as a general comment, I've seen more and more people, myself included, that try to do everything using `pocketsphinx_continuous` executable, as if it were the "swiss army knife" of pocketsphinx. But what I see is that it is just an example program and should be treated as such, and if you want to do more advanced things, you should program your own software using sphinx library/API/etc. Is my appreciation correct? – jotadepicas Aug 22 '16 at 16:19
  • 1
    This is correct, it is definitely not a swiss army knife. – Nikolay Shmyrev Aug 22 '16 at 16:43
  • @NikolayShmyrev I know this question is a bit old but do you think you could help me with a question? I have some C++ code and I've had a hard time running this script from it and wanted to know if there's a C++ version to this script ? – Marco Neves Mar 30 '17 at 02:48
  • 1
    @MarcoNeves, you can copy-paste code from `pocketsphinx_continuous` from `src/programs/continuous.c` in your application and it will work. – Nikolay Shmyrev Mar 30 '17 at 06:01
  • @NikolayShmyrev I found the file but I'm not sure what you meant by copy/past the code into my application. Do you mean making a copy of it inside my workspace ? And how would I call it within my program ? – Marco Neves Mar 30 '17 at 06:46
  • Just like you call any other C method. – Nikolay Shmyrev Mar 30 '17 at 10:39
  • what does the .list file consist of? – Justin Furuness Sep 16 '20 at 19:15