We're trying to get CMU Sphinx4 to recognize only between the words yes and no in spanish (si and no). We've implented Sphinx4 with the spanish model es_cont_2000 from voxforge. We've created the language model (attached below), and when recognizing the word "No" we have almost 100% accuracy. However, when recognizing "Si" (Yes) it's only about 50%.
Does someone have suggestions for how to get better accuracy for such a reduced set of words aside from adapting the language model (http://cmusphinx.sourceforge.net/wiki/tutorialadapt)?
Are there better language models for Latin American Spanish or other ways?
This is an ARPA-format language model file, generated by CMU Sphinx
\data\
ngram 1=4
ngram 2=4
ngram 3=4
\1-grams:
-0.7782 </s> -0.1761
-0.3010 <s> -0.5228
-0.7782 no -0.3978
-0.7782 si 0.0000
\2-grams:
-0.1761 </s> <s> -0.0791
-0.3978 <s> no 0.1761
-0.3978 <s> si -0.2217
-0.1761 no </s> 0.1761
\3-grams:
-0.3010 </s> <s> si
-0.3010 <s> no </s>
-0.3010 <s> si </s>
-0.3010 no </s> <s>
\end\