1

I have a program recognizing speech quite well with System.Speech using SpeechRecognitionEngine. However, although accurate, it seems to throw away some audio input it receives. If I say, "one, two, three" with pauses between each word, it transcribes each work correctly. However, if I say them without a pause between each word, it will transcribe the first and sometimes the third word correctly. The second word is simply ignored.

Other people have had this problem, but I haven't been able to discovered their solutions. Microsoft Speech Recognition Speed

If I could I would like to set the recorder audio position to an earlier point in the audio stream but I haven't found a function in the API that would let me do this. Another approach I was considering was to have multiple recognition engines where each would attempt to take just one word and would be reused when it's finished handling that word but that's a very complex and resource hungry solution.

Any help on this problem would be appreciated.

I've cut it down to this piece of C# code:

public void Init()
{
    // Create an in-process speech recognizer for the en-US locale.
    var cultureInfo = new System.Globalization.CultureInfo("en-US");
    recognizer_ = new SpeechRecognitionEngine(cultureInfo);

    // Create and load a dictation grammar.
    var numbers = new Choices();
    numbers.Add(new string[] { "one", "two", "three" });

    // Create a GrammarBuilder object and append the Choices object.
    GrammarBuilder gb = new GrammarBuilder();
    gb.Append(numbers);
    var g = new Grammar(gb);
    recognizer_.LoadGrammar(g);

    // Add a handler for the speech recognized event.
    recognizer_.SpeechRecognized +=
        new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
    recognizer_.SpeechDetected += recognizer_SpeechDetected;

    // Configure input to the speech recognizer.
    recognizer_.SetInputToDefaultAudioDevice();

    // Start asynchronous, continuous speech recognition.
    recognizer_.RecognizeAsync(RecognizeMode.Multiple);
}

void recognizer_SpeechDetected(object sender, SpeechDetectedEventArgs e)
{
    Console.WriteLine("\nspeech detected event audio position:\t\t" + e.AudioPosition);
    Console.WriteLine("speech detected current audio position:\t\t" + recognizer_.AudioPosition);
    Console.WriteLine("speech detected recognizer audio position:\t" + recognizer_.RecognizerAudioPosition);
}

// Handle the SpeechRecognized event.
void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    Console.WriteLine("speech recognized event audio position:\t\t" + e.Result.Audio.AudioPosition);
    Console.WriteLine("speech recognized event audio start time: " + e.Result.Audio.StartTime);
    Console.WriteLine(e.Result.Text);

    // do things
    // ...
}
Community
  • 1
  • 1
Phlox Midas
  • 4,093
  • 4
  • 35
  • 56

1 Answers1

2

Instead of

gb.Append(numbers);

Which specifies to recognize isolated numbers try something like

gb.Append(new GrammarBuilder(numbers), 1, 5);

Which will allow to recognize number sequencies up to 5 numbers. Adjust repetition count according to your needs.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Wow. I hadn't considered that the grammar was to blame. Thanks. BTW, is there a downside to having a large number like 20 or so? Just memory usage? I could live with that. – Phlox Midas Feb 04 '14 at 15:32
  • There are no significant downsides, it all depends on the task you are trying to solve. If you want to allow many repetitions it's fine. – Nikolay Shmyrev Feb 04 '14 at 15:54