1

I have a very specific requirement. I am working on an application which will allow users to speak their employee number which is of the format HN56C12345 (any alphanumeric characters sequence) into the app. I have gone through the link: http://cmusphinx.sourceforge.net/wiki/tutoriallm but I am not sure if that would work for my usecase.

So my question is three-folds :

  1. Can Sphinx4 actually recognize an alphanumeric sequence with high accuracy like an emp number in my case?
  2. If yes, can anyone point me to a concrete example / reference page where someone has built custom language support in Sphinx4 from scratch. I haven't found a detailed step-by-step doc yet on this. Did anyone work on alphanumeric sequence based dictionaries or language models?
  3. How to build an acoustic model for this scenario?
Stefanus
  • 1,619
  • 3
  • 12
  • 23
Qedrix
  • 453
  • 1
  • 8
  • 15

1 Answers1

1

You don't need a new acoustic model for this, but rather a custom grammar. See http://cmusphinx.sourceforge.net/wiki/tutoriallm#building_a_grammar and http://cmusphinx.sourceforge.net/doc/sphinx4/edu/cmu/sphinx/jsgf/JSGFGrammar.html to learn more. Sphinx4 recognizes characters just fine if you put them space-separated in the grammar:

#JSGF V1.0
grammar jsgf.emplID;
<digit> = zero | one | two | three | four | five | six | seven | eight | nine ;
<digit2> = <digit> <digit>   ;
<digit4> = <digit2> <digit2> ;
<digit5> = <digit4> <digit>  ;
// This rule accepts IDs of a kind: hn<2 digits>c<5 digits>.
public <id> = h n <digit2> c <digit5> ;

As to accuracy, there are two ways to increase it. If the numbers of employees isn't too large, you can just make the grammar with all possible employee IDs. If this is not your case, than to have a generic grammar is your only option. Although it's possible to make a custom scorer which will use the context information to predict the employee ID better than the generic algorithm. This way requires some knowledge in both ASR and CMU Sphinx code.

Alexander Solovets
  • 2,447
  • 15
  • 22
  • Thanks Alexander for your response. After following your instruction on Sphinx4 and writing my own grammer like you showed, I tried to run the LiveSpeechRecognizer using maven. But I got the message : The dictionary is missing a phonetic transcription for the word 'H' and others. What am I doing wrong? I used this tutorial - http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4 Dictionary path set : configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict"); – Qedrix Oct 09 '15 at 10:08
  • Yup, I forgot that CMU Sphinx has lowercase dictionary, so change 'H' to 'h' and so forth. I updated my answer above. – Alexander Solovets Oct 09 '15 at 15:30
  • I also replaced digits with words as CMU dictionary doesn't have ones. – Alexander Solovets Oct 09 '15 at 16:12
  • Thanks again. I will try this in a bit. But do you also think that minimizing the generic dictionary size which I am using at the moment will help in our case? Many times I have observed that when I say MH the recognizer picks up the word 'Image' from the dictionary for the same. On closer observation I found that the dictionary actually is defined to handle that pronunciation likewise. Anything I can do about it? – Qedrix Oct 10 '15 at 18:27
  • No, dictionary has nothing to do with the set of sentences your application is able to recognize. It's all in the language model. You can try to decrease word insertion probability to get rid of unwanted words. – Alexander Solovets Oct 10 '15 at 23:04
  • This thing is working, but looks like I must speak very fast to make it accurate. Also, wanted to know how we may be able to add alphabets just like we use in the grammar? Say I want the order of character in the ID random? – Qedrix Oct 12 '15 at 07:15
  • 1
    The only way to add the whole alphabet is to construct the corresponding rule letter by letter, i.e. something like ` = a | b | c | d ... `. As to accuracy - you have to share your recordings as well as the code, otherwise it's hard to guess what caused the problem. I recommend to post questions similar to these to the [sphinx forum](http://sourceforge.net/p/cmusphinx/discussion/). – Alexander Solovets Oct 12 '15 at 15:53