5

I have been installing Pocketsphinx0.7 on a VM running Debian Squeeze. This worked fine and I can try to recognize speech from files.Having this I've built some python scripts which recognize a bunch of files I got and then estimating the word error rate. These use gstreamer as described in this tutorial.

So far I am using the original hmm which was in the pocketsphinx tarball, a dictionary which simply contains the words from my test data and an optimized language model I got from my professor. This should work as it is also running in a production system. My problem now is that the recognition performance is still horrible. I have an word error (WER) rate of about 85%.

What I want to know is how I can improve the WER. What kind of steps can I take?

Another thing that happens and probably impacts performance is that pocketsphinx tells me it has no permission to access the hmm although I made the hmm accessible for read,write and execute for everyone.

Does anyone have an idea where this may come from? I' appreciate any kind of help. If you need more information please let me know.


EDIT:

I created a small testset and ran pocketsphinx. This is where you can find the files and the results. I was allowed to give you some examples from the original test set. You can find it here.
These are the worst examples. Short utterances of 1-2 words work well. Sorry I couldn't create a big test set so far, my time is very limited.

Matas Vaitkevicius
  • 58,075
  • 31
  • 238
  • 265
elramino
  • 53
  • 1
  • 6

1 Answers1

2

What I want to know is how I can improve the WER. What kind of steps can I take?

This issue is described in Pocketsphinx FAQ:

http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor

The first step is to collect a database of test samples

If you need help to improve the accuracy, you need to share that database and results you are looking for and the actual results. You can share here or on Sourceforge forum. You need to pack all the files into archive an upload somewhere. Then you can give here a link.

For more information see

http://cmusphinx.sourceforge.net/wiki/communicate

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • I've seen the CMU Sphinx FAQ already my problem is that I must have misconfigured pocketsphinx given the poor accuracy.Looking at my results I think the language model is ignored. Since everything works over the gstreamer plugin it isn't covered on the page (at least i didn't find it). Due to legal issues i can't share samples but only hypotheses and transcriptions. I will edit it into my post. Thanks for your answer – elramino Jul 02 '12 at 08:21
  • I have checked again and I cannot even share the outputs. I will take free examples and post the results here asap. – elramino Jul 02 '12 at 09:28
  • Given the data you shared now it seems the language model you are using is not quite correct. If you say that short words work, then most likely the language model is trained to recognize short words first of all. With the default pocketsphinx model on the set you shared the error rate is 64%, not 85%. With a good language model it can be 40%. I also see that you recorded UK English, not US English. With acoustic model adaptation from US English model to UK English you can reduce error rate to 20% or even less. – Nikolay Shmyrev Jul 03 '12 at 17:29
  • Unfortunately the error can't be the model itself as it is also used in a productive environment. Also I am not achieving a result as good as 64% with any lm. I was pointed to the fact that there is a weighting to leverage the impact of it on the results. So far I couldn't find anything to change this weighting, do you know anything about this? – elramino Jul 04 '12 at 14:11
  • This is the first thing you need to look on, you should not think about any weights until you get 64% with default lm. This small command pocketsphinx_batch -samprate 8000 -ctl res.ctl -hyp res.hyp -adcin yes -cepext .wav -cepdir . Should give you result like http://pastebin.com/7FsKaEfW. Only after that you can start to improve your accuracy. When someone says something in speech recognition you need to be very careful about that. There are too many people around who just do not have the right picture – Nikolay Shmyrev Jul 04 '12 at 14:17
  • Well I am getting the exact same result with pocketsphinx_batch. My problem doesn't seem to be the configuration but in the gstreamer pipeline. I think this can be regarded as solved. If you have any expierience with pocketsphinx in that area I'd still appreciate your help. – elramino Jul 06 '12 at 09:06