1

Now I'm in the process of making a dictionary application using voice. I have made this dictionary and there are about 100000 words as database. This dictionary needs to be searched by voice. For this, I use Sphinx4 / cmusphinx as a tool to be used. I've read references to related websites and successfully run the application samples. Then i implement same methodology in the this sample (HelloWorld) into my dictionary. Previously, I have already put 100000 words in the grammar (.gram). When I try to run it, my dictionary becomes frozen and after 5 minutes later, eclipse show "Java Heap Size Out of Memory"

configuration of grammar

#JSGF V1.0;
grammar hello;
public <database> = ([<Words>])*;
<Words>= 100000 words split by "|"

For sphinx4, i used this version http://sourceforge.net/projects/cmusphinx/files/sphinx4/1.0%20beta6/

Is my method to implement voice speech in my dictionary correct?

Is there any good references for building such search engine with a large database of words (approximately 100000 words)?

Hope you could help me.

barryhunter
  • 20,886
  • 3
  • 30
  • 43
davinma06
  • 23
  • 4

1 Answers1

0

The approach is ok.

If you do not have enough memory for JVM, you can increase it with -Xmx option

For the accurate retrieval it's better to create a unigram language model with frequencies of the words, not just a plain list. See for details

http://cmusphinx.sourceforge.net/wiki/tutoriallm

For the best accuracy it's better to use latest high-level API, see for details

http://cmusphinx.sourceforge.net/wiki/sphinx4

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Thank you for your answer. I had already increased jvm to 1024 and still i don't get what i expected. When i spoke "Hello", There was nothing happened instead of printing the spoken word. is it very hard for read 100000 words ? .. after i read http://cmusphinx.sourceforge.net/wiki/tutoriallm , I feel confused about dmp format and lm format. There was in sample application (HelloNGram) which is used .lm. My question is, what is different between these two ? which is better ? . For dmp format, how can i implement dmp format into my program ? i could not find it in given references. – davinma06 Nov 24 '14 at 19:13
  • First of all please upgrade to the latest version. Then you need to download en-us-generic acoustic model for the best accuracy. I'm not sure what do you mean by "implement dmp format into my program". Lm format and dmp format are equivalent representations of the language model. lm format is text and dmp format is binary. There is a tool that converts between them as described in tutorial. – Nikolay Shmyrev Nov 24 '14 at 19:31
  • Hi Nikolay. There is something that i want to ask about your comment in this http://stackoverflow.com/questions/26925322/cmusphinx-live-speech-recognition-too-slow. I am very curious about those four values in config.xml . How can those things affect sphinx4's speed and accuracy ? and if those things influence, then i want to edit those value to try getting the best setting in term to accuracy and speed. is there any references or guide how to modify it ? I had already downloaded and used the latest English model language(.dmp) but accuracy is still not what i expected. i used Sphinx4-5 alpha. – davinma06 Dec 03 '14 at 16:18