Pocketsphinx decoder adds words of its own

Question

I am using Ubuntu 12.04, Python 2.7 & PocketSphinx.

I made a custom dictionary, language model using online LM tool. Using pocketsphinx_continous to decode the spoken voice gives me 100% accuracy. But using PyAudioto record sound in Python recognises the text but adds 'A' and 'AND' with the main context as shown in the images below PocketSphinx_Continuous PocketSphinx in Python How to cure it?

Dictionary: https://dl.dropboxusercontent.com/u/69889915/8143.dic LM: https://dl.dropboxusercontent.com/u/69889915/8143.lm STT: https://dl.dropboxusercontent.com/u/69889915/STT.py — VeilEclipse, Apr 16 '13 at 07:34
Please provide audio file in question. Please provide files in a single archive to download, not a collection of links. — Nikolay Shmyrev, Apr 16 '13 at 07:54
Here is everything: https://dl.dropboxusercontent.com/u/69889915/Sample.tar.gz — VeilEclipse, Apr 16 '13 at 12:34

score 2 · Accepted Answer · answered Apr 16 '13 at 16:56

The reason for the reduced accuracy is that you've added an artifical zero-silenced region around the utterance and that corrupts spectrum and reduces accuracy. Stop doing that. Instead, just recognize the sound you have recorded.

If you still need to decode zero-silenced regions, you need to add

 dither="yes"

option in decoder arguments in order to let decoder work around them. Once you add this option result will be accurate.

Pocketsphinx decoder adds words of its own

1 Answers1