I'm trying to do speech-to-text on some wave files using the MS cognitive Speech SDK. It works well enough for some files but it gets stuck for others. By stuck, I mean that it doesn't stop until cancelled manually.
I tried first with the RecognizeOnceAsync
method:
private static void processRecording()
{
var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
speechConfig.SpeechRecognitionLanguage = "es-MX";
speechConfig.OutputFormat = OutputFormat.Detailed;
using (var audioStream = new PushAudioInputStream())
{
audioStream.Write(File.ReadAllBytes("myfilepath"));
using (var audioConfig = AudioConfig.FromStreamInput(audioStream))
{
using (var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig))
{
var result = speechRecognizer.RecognizeOnceAsync().Result;
switch (result.Reason)
{
case ResultReason.RecognizedSpeech:
Console.WriteLine($"RECOGNIZED: Text={result.Text}");
break;
case ResultReason.NoMatch:
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
break;
case ResultReason.Canceled:
var cancellation = CancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}, ErrorCode={cancellation.ErrorCode}, ErrorDetails={cancellation.ErrorDetails}");
break;
}
}
}
}
}
And with this I get (after over a minute):
CANCELED: Reason=Error, ErrorCode=ServiceTimeout, ErrorDetails=Timeout: no recognition result received SessionId: 322853a3085d41ec9b60ee940531038c
I then tried with StartContinuousRecognitionAsync
:
private async static Task processRecordingsAsync()
{
var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
speechConfig.SpeechRecognitionLanguage = "es-MX";
speechConfig.OutputFormat = OutputFormat.Detailed;
var waiter = new System.Threading.ManualResetEvent(false);
var audioStream = new PushAudioInputStream();
audioStream.Write(File.ReadAllBytes("myfilepath"));
var audioConfig = AudioConfig.FromStreamInput(audioStream);
var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
Action cleanup = () =>
{
waiter.Set();
try { speechRecognizer.Dispose(); } catch { }
try { audioConfig.Dispose(); } catch { }
try { audioStream.Dispose(); } catch { }
};
speechRecognizer.Recognizing += (sender, e) => Console.WriteLine($"Recognizing: {e.Result.Text}");
speechRecognizer.SessionStarted += (sender, e) => Console.WriteLine($"Recognize session started");
speechRecognizer.SessionStopped += (sender, e) => Console.WriteLine($"Recognize session stopped");
speechRecognizer.SpeechEndDetected += (sender, e) => Console.WriteLine($"Speech end detected");
speechRecognizer.SpeechStartDetected += (sender, e) => Console.WriteLine($"Speech start detected");
speechRecognizer.Recognized += (sender, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"Recognized text: {e.Result.Text}");
}
else
{
Console.WriteLine($"Could not recognize text: {e.Result.Reason}");
}
cleanup();
};
speechRecognizer.Canceled += (sender, e) =>
{
Console.WriteLine($"Error trying to recognize text: Reason = {e.Reason}, ErrorCode = {e.ErrorCode}, ErrorDetails = {e.ErrorDetails}");
cleanup();
};
await speechRecognizer.StartContinuousRecognitionAsync();
if (!waiter.WaitOne(60000))
{
await speechRecognizer.StopContinuousRecognitionAsync();
}
}
And with this I get:
Recognize session started
Speech start detected
Recognizing: con el
Recognizing: con el servicio de tele
Recognizing: con el servicio de tele terapia
Recognizing: con el servicio de tele terapia de
Recognizing: con el servicio de tele terapia de tercer
Recognize session stopped
Error trying to recognize text: Reason = Error, ErrorCode = ServiceTimeout, ErrorDetails = Timeout while waiting for service to stop SessionId: e289298cf97447b89bd088a665e6c095
So it's doing about 90% of the file (which is about 4 seconds long) but it gets stuck and doesn't end till I force it with StopContinuousRecognitionAsync
.
When I try this file on the speech studio, it recognizes almost exactly the same thing but it does not get stuck.
Note that I am using a free subscription. Could it be because of that? Is there anything else I could try?