9

I'm trying to use the speech recognition in .net to recognize the speech of a podcast in an mp3 file and get the result as string. All the examples I've seen are related to using microphone but I don't want to use the microphone and provide a sample mp3 file as my audio source. Can anyone point me to any resource or post an example.

EDIT -

I converted the audio file to wav file and tried this code on it. But it only extracts the first 68 words.

public class MyRecognizer {
    public string ReadAudio() {
        SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
        Grammar gr = new DictationGrammar();
        sre.LoadGrammar(gr);
        sre.SetInputToWaveFile("C:\\Users\\Soham Dasgupta\\Downloads\\Podcasts\\Engadget_Podcast_353.wav");
        sre.BabbleTimeout = new TimeSpan(Int32.MaxValue);
        sre.InitialSilenceTimeout = new TimeSpan(Int32.MaxValue);
        sre.EndSilenceTimeout = new TimeSpan(100000000);
        sre.EndSilenceTimeoutAmbiguous = new TimeSpan(100000000);
        RecognitionResult result = sre.Recognize(new TimeSpan(Int32.MaxValue));
        return result.Text;
    }
}
Soham Dasgupta
  • 5,061
  • 24
  • 79
  • 125

2 Answers2

15

Try reading it in a loop.

SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
Grammar gr = new DictationGrammar();
sre.LoadGrammar(gr);
sre.SetInputToWaveFile("C:\\Users\\Soham Dasgupta\\Downloads\\Podcasts\\Engadget_Podcast_353.wav");
sre.BabbleTimeout = new TimeSpan(Int32.MaxValue);
sre.InitialSilenceTimeout = new TimeSpan(Int32.MaxValue);
sre.EndSilenceTimeout = new TimeSpan(100000000);
sre.EndSilenceTimeoutAmbiguous = new TimeSpan(100000000); 

StringBuilder sb = new StringBuilder();
while (true)
{
    try
    {
        var recText = sre.Recognize();
        if (recText == null)
        {               
            break;
        }

        sb.Append(recText.Text);
    }
    catch (Exception ex)
    {   
        //handle exception      
        //...

        break;
    }
}
return sb.ToString();

If you've a Windows Forms or WPF application, run this code in a seperate thread, otherwise it blocks the UI thread.

ProgramFOX
  • 6,131
  • 11
  • 45
  • 51
keyboardP
  • 68,824
  • 13
  • 156
  • 205
  • 2
    Yes, this works. I also edited your answer and added that if the OP uses WinForms/WPF, he should run the code in a seperate thread, because otherwise it blocks the UI thread. – ProgramFOX Aug 01 '13 at 09:46
  • I get this error when I use your code above: `MyProgram.vshost.exe Information: 0 : SAPI does not implement phonetic alphabet selection.` – Micro Jan 25 '16 at 18:27
  • @MicroR - Try setting the culture to your locale http://stackoverflow.com/questions/27198683/sapi-does-not-implement-phonetic-alphabet-selection-exception – keyboardP Jan 25 '16 at 19:51
0

I would look first at the method documented here: http://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognitionengine.setinputtowavefile.aspx

You should be able to work it out from here I think.

Alex Paven
  • 5,539
  • 2
  • 21
  • 35
  • 2
    An MP3 file is NOT a Wave (.wav) file (and `SetInputToWaveFile()` is only for Wave files), so your solution won't work. – ProgramFOX Jul 27 '13 at 10:17
  • @Soham: Why should I read my article? Did I wrote something incorrect in it? – ProgramFOX Jul 27 '13 at 11:25
  • I said I read your article. It was good. But can you provide any solution to my problem? – Soham Dasgupta Jul 27 '13 at 11:29
  • @Soham: I did read 'Read your article', but then I think you wrote 'I read your article'. I didn't mention the 'I'. But unfortunately, I don't find a solution to your problem. I only find a solution to convert a .wav file to text. – ProgramFOX Jul 27 '13 at 12:17
  • I already converted my audio file to wav and tried to extract some text. I tried it with a engadget podcast. The porblem is however that I cannot transcript more than 68 words. – Soham Dasgupta Jul 27 '13 at 12:26
  • @ProgramFOX: my solution was actually just a hint to get the OP going in the right direction. I know close to nothing about the topic at hand (as in I didn't work with System.Speech), but right next to that method is another one, SetInputToWaveStream. You can get a wave stream using something like NAudio. Since we're all programmers I assumed we don't really need to be spoon-fed with the exact solution down to the lines of code you need to copy and paste. – Alex Paven Jul 29 '13 at 09:01