0

I downloaded CMU SphinxBase (sphinxbase-5prealpha.tar.gz) and Pocket Sphinx (pocketsphinx-5prealpha.tar.gz)and installed all required packages (sudo apt-get libtool bison python-dev autotools swig) and ran through all the steps (http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx).

On my RPI I ran> pocketsphinx_continuous -inmic yes I have a USB Logitech webcam which performed well with Google API V2.

I spoke all the English words I know and pocketsphinx_continuous. It gave me message like the one below. I was hoping it will do some recognition and I will start to improve it but with Zero recognition, I am not sure how to improve.

READY....
Listening...
INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00  3.00 -1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to   < 34.68 -4.34  8.66 -9.45 -0.21 -2.80  2.86  1.73  6.98  5.36  4.14  0.69  1.67 >
INFO: ngram_search_fwdtree.c(1553):      961 words recognized (7/fr)
INFO: ngram_search_fwdtree.c(1555):   497161 senones evaluated (3551/fr)
INFO: ngram_search_fwdtree.c(1559):  1453632 channels searched (10383/fr), 98192 1st, 13846 last
INFO: ngram_search_fwdtree.c(1562):     2097 words for which last channels evaluated (14/fr)
INFO: ngram_search_fwdtree.c(1564):    40961 candidate words for entering last phone (292/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 11.18 CPU 7.986 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 24.17 wall 17.265 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 6 words
INFO: ngram_search_fwdflat.c(948):      696 words recognized (5/fr)
INFO: ngram_search_fwdflat.c(950):     8170 senones evaluated (58/fr)
INFO: ngram_search_fwdflat.c(952):     4239 channels searched (30/fr)
INFO: ngram_search_fwdflat.c(954):      940 words searched (6/fr)
INFO: ngram_search_fwdflat.c(957):      276 word transitions (1/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.86 CPU 0.614 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 1.77 wall 1.265 xRT
INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.47
INFO: ngram_search.c(1279): Eliminated 2 nodes before end node
INFO: ngram_search.c(1384): Lattice has 243 nodes, 194 links
INFO: ps_lattice.c(1380): Bestpath score: -1185
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:47:138) = -75028
INFO: ps_lattice.c(1441): Joint P(O,S) = -97858 P(S|O) = -22830
INFO: ngram_search.c(875): bestpath 0.01 CPU 0.007 xRT
INFO: ngram_search.c(878): bestpath 0.02 wall 0.015 xRT
READY....
Listening...
Input overrun, read calls are too rare (non-fatal)
INFO: ngram_search.c(467): Resized score stack to 200000 entries
INFO: ngram_search_fwdtree.c(952): cand_sf[] increased to 64 entries
INFO: ngram_search.c(459): Resized backpointer table to 10000 entries
INFO: ngram_search.c(467): Resized score stack to 400000 entries
Input overrun, read calls are too rare (non-fatal)
INFO: ngram_search.c(459): Resized backpointer table to 20000 entries
Input overrun, read calls are too rare (non-fatal)
Input overrun, read calls are too rare (non-fatal)
SC-SL
  • 377
  • 3
  • 19

1 Answers1

1

It is not possible to recognize large vocabulary speech on Raspberry Pi, it is too slow for that. You see in the log it runs 17 times slower than realtime.

You can stream the data to the server or configure small grammar for recognition if you still want to recognize on the device.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87