1

I'm trying to implement a German command and control application with CMUSphinx and Java. So far, the application should recognize only a few words (numbers from 1 to 9, yes/no).

Unfortunately the accuracy is very bad. It seems, if a word is recognized correctly, it is only by chance.

Here is my java code so far (adapted from the tutorial):

public static void main(String[] args) throws IOException {

    // Configuration Object
    Configuration configuration = new Configuration();

    // Set path to the acoustic model.
    configuration.setAcousticModelPath("resource:/cmusphinx-de-voxforge-5.2");

    // Set path to the dictionary.
    configuration.setDictionaryPath("resource:/cmusphinx-voxforge-de.dic");

    // use grammar
    configuration.setGrammarPath("resource:/");
    configuration.setGrammarName("dialog");
    configuration.setUseGrammar(true);

    LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);

    recognizer.startRecognition(true);
    SpeechResult result;
    while ((result = recognizer.getResult()) != null) {
        System.out.format("Hypothesis: %s\n", result.getHypothesis());
    }
    recognizer.stopRecognition();
}

Here is my grammer file:

#JSGF V1.0;

grammar dialog;

public <digit> = 1 | 2 | 3 | 4 |5 | 6 | 7 | 8 | 9 | ja | nein; 

I've downloaded the German acoustic model and dictionary from here: https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/German/

Is there something obvious I'm missing here? Where is the problem?

Thanks in advance and kind regards.

spaenigs
  • 152
  • 1
  • 10
  • You need to provide audio data and the models you changed to reproduce your problems. Digits like 1,2 are not part of the dictionary by default, you can not use them in a grammar. – Nikolay Shmyrev Apr 17 '17 at 12:49
  • Thanks for your replay. 1) What do you mean by audio data? The audio which I want to recognize? Or for a new acoustic model? 2) I changed 1 to eins and so one (which are part of the dictionary). Doesn't seem to improve the accuracy :( – spaenigs Apr 17 '17 at 14:51
  • I furthermore added `cmusphinx-voxforge-de.lm.bin`. No effect. – spaenigs Apr 17 '17 at 14:57
  • You need to provide the audio you want to recognize as a file and you also need to provide the other data files you are using. – Nikolay Shmyrev Apr 17 '17 at 15:05
  • Ah ok, thanks again. If I have to provide my answers as a file, what is the point of `LiveSpeechRecognizer`? I thought I can use my microphone and live recognize speech? – spaenigs Apr 17 '17 at 15:26
  • To debug accuracy issues you need an audio file. That will help you (and me) to reproduce problem. Reproduction is the first step to solution. – Nikolay Shmyrev Apr 17 '17 at 22:16
  • You can find the sample audio files [here](https://jlubox.uni-giessen.de/dl/fiFwpyKywVPgBHy7DiyhKB4H/audio.zip). Recorded with 16 bit and 16k Hz. I was able to increase the accuracy to ~50 %. What do you recommend as further steps? – spaenigs Apr 20 '17 at 12:34
  • I will try to adapt the acoustic model as described [on this page](http://cmusphinx.sourceforge.net/wiki/tutorialadapt). – spaenigs Apr 21 '17 at 08:42
  • I was able to adapt the model increase the accuracy to ~65 %. 1) Do you think thats a good result for 11 words? 2) Does it make sense to record the same word a couple of times from different poeple and adapt again? 3) How can I add the mllr_matrix file from the `bw` run to my sphinx4 application? The instructions I found [here](https://sourceforge.net/p/cmusphinx/mailman/cmusphinx-commits/thread/From_nshmyrev@users.sourceforge.net_Sun_Sep_28_10:54:17_2014/) are not working. P.S. Thank you :) – spaenigs Apr 21 '17 at 14:55

2 Answers2

1

Well, accuracy is not great, probably the original database didn't have many examples like yours. Partially your dialect also contributes, Germans say 7 with z, not with s. Partially echo in your room contributes too. I am not sure how you recorded your audio, if you used some compression or codec in between it might also contribute to bad accuracy.

You might want to collect few hundred samples and perform MAP adaptation to improve the accuracy.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
1

I have tried to use pocketsphinx with Eng and German model and accuracy is good when it comes with predefined/limited set of phrases! You can forget about general things like "could you please find me a restaurant in the downtown".

To achieve good accuracy with a pocketshinx:

  • Check that your mic, audio device, file and everything are 16 kHz while general model is trained with such acoustic examples.
  • You should create your own limited dictionary you cannot use cmusphinx-voxforge-de.dic while accuracy is dramatically dropped.
  • You should create your own language model.
  • You can try to modify pronunciation files to fit your accent.

You can search for Jasper project on GitLab to see how it's implemented. Also you can check the documentation

Ievgen
  • 4,261
  • 7
  • 75
  • 124