2

Is it possible to use the same decoder for multiple wav files in Pocketsphinx (Python)? I have the following code snippet, which is very standard, except that I call the decoder twice on the same file. The outputs are not the same, however. I've also tried using the decoder twice on different files, and the outputs are different depending on the order in which I call the files - the first file decodes correctly, but the second file does not decode correctly. Furthermore, this only happens if there is some output from the first file - if the first file doesn't have any words, then the second file decodes fine. This makes me believe the decoder is modified in some way after decoding one file. Am I correct about this? Is there any way to reset the decoder, or in general make it work for multiple files? It seems like there should be given the example here: https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/decoder_test.py.

config = ps.Decoder.default_config()    
config.set_string('-hmm', os.path.join(MODELDIR, 'en-US/acoustic-model'))
config.set_string('-lm', os.path.join(MODELDIR, 'en-US/language-model.lm.bin'))
config.set_string('-dict', os.path.join(MODELDIR, 'en-US/pronounciation-dictionary.dict'))
config.set_string('-logfn', 'pocketsphinxlog')
decoder = ps.Decoder(config)

wavname16_1 =  os.path.join(DATADIR, 'arctic_a0001.wav')
# Decode streaming data.
decoder.start_utt()
stream = open(wavname16_1, 'rb')
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
    else:
        break
decoder.end_utt()
stream.close()
words = [(seg.word, seg.prob) for seg in decoder.seg()]
print words

wavname16_2 =  os.path.join(DATADIR, 'arctic_a0002.wav')
decoder.start_utt()
stream = open(wavname16_2, 'rb')
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
    else:
        break
decoder.end_utt()
stream.close()
words = [(seg.word, seg.prob) for seg in decoder.seg()]
print "arctic2: " + words

EDIT - Some further information:

If arctic_a0001.wav is http://festvox.org/cmu_arctic/cmu_arctic/cmu_us_bdl_arctic/wav/arctic_a0001.wav, arctic_a0002.wav is http://festvox.org/cmu_arctic/cmu_arctic/cmu_us_bdl_arctic/wav/arctic_a0002.wav, and the dictionary is the single line:

of AH V

then the current output is:

arctic1: [('<s>', 1), ('of', 1), ('of', -12001), ('<sil>', 0), ('of', -16211), ('<sil>', -1205), ('of', -13991), ('of', 0), ('<sil>', 0), ('of', -31232), ('</s>', 0)]
arctic2: [('<s>', -3), ('[SPEECH]', -725), ('<sil>', -1), ('[SPEECH]', -6), ('<sil>', -20), ('of', -6162), ('[SPEECH]', -397), ('</s>', 0)]

but if we switch them, the output becomes

arctic2: [('<s>', 0), ('of', 0), ('<sil>', 0), ('of', -29945), ('<sil>', -20), ('of', -26004), ('of', 0), ('of', 0), ('<sil>', 0), ('of', -84868), ('of', -35690), ('</s>', 0)]
arctic1: [('<s>', -3), ('of', -14886), ('of', -30237), ('<sil>', 0), ('of', -22103), ('of', 1), ('<sil>', 0), ('of', -30795), ('of', -65040), ('</s>', 0)]

so the outputs of arctic1 and arctic2 depend on the order. Furthermore, if we use arctic1 twice, the output is

[('<s>', 1), ('of', 1), ('of', -12001), ('<sil>', 0), ('of', -16211), ('<sil>', -1205), ('of', -13991), ('of', 0), ('<sil>', 0), ('of', -31232), ('</s>', 0)]
[('<s>', 1), ('of', -24424), ('of', -24554), ('<sil>', 2), ('[SPEECH]', -37257), ('of', -37008), ('<sil>', -461), ('of', -20422), ('of', 0), ('<sil>', 0), ('of', -3570), ('[SPEECH]', -42), ('</s>', 0)]

Maybe it is a problem with me not using start_stream()? I am not sure how I should use it. Even if I use decoder.start_stream() (directly before decoder.start_utt()), the output is different - it becomes

[('<s>', 1), ('of', 1), ('of', -12001), ('<sil>', 0), ('of', -16211), ('<sil>', -1205), ('of', -13991), ('of', 0), ('<sil>', 0), ('of', -31232), ('</s>', 0)]
[('<s>', -2), ('of', -33113), ('of', -29715), ('<sil>', 1), ('[SPEECH]', -37258), ('of', -37009), ('<sil>', -461), ('of', -20422), ('of', 0), ('<sil>', 0), ('of', -3570), ('[SPEECH]', -42), ('</s>', 0)]

If you want the entire log, here (http://pastebin.com/2dNeyS1x) is the log for arctic1 before arctic2, and here (http://pastebin.com/Nkvj2G0g) is the log for arctic2 before arctic1, while here is the log for arctic1 two times in a row with start_stream (http://pastebin.com/HWq6j7X2), and here is the log for arctic1 two times in a row without start_stream (http://pastebin.com/MsadW4nh).

1 Answers1

0

Is it possible to use the same decoder for multiple wav files in Pocketsphinx (Python)?

Yes

I have the following code snippet, which is very standard, except that I call the decoder twice on the same file. The outputs are not the same, however.

You need to call decoder.start_stream() for the second file to reset the decoder timings.

I've also tried using the decoder twice on different files, and the outputs are different depending on the order in which I call the files - the first file decodes correctly, but the second file does not decode correctly. Furthermore, this only happens if there is some output from the first file - if the first file doesn't have any words, then the second file decodes fine.

Well, there could be different things what affect result. It is hard to say without example. You'd better provide sample files and the problematic output to get an answer on this question.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87