1

We are currently evaluating the Bing Speech Recognition Service in a live streaming scenario. We are getting a live stream of PCM encoded audio (16k samplerate, 16bit, 1 channel (aka mono)) and trying to send this to the Bing Speech Recognition service.

We have successfully used the DataRecognitionClient from https://www.nuget.org/packages/Microsoft.ProjectOxford.SpeechRecognition-x64/ with our scenario by sending the audio format prior to streaming the audio itself, like so: _dataRecognitionClient.SendAudioFormat(SpeechAudioFormat.create16BitPCMFormat(16000));

We are then streaming the audio stream in a loop like so:

_dataRecognitionClient.SendAudio(buffer, bytesRead);

This works fine. However we assume that the ProjectOxford library might get deprecated, since the official Bing Speech Recognition website (https://www.microsoft.com/cognitive-services/en-us/Speech-api/documentation/GetStarted/GetStartedCSharpServiceLibrary) points to a different Nuget package, see: https://www.nuget.org/packages/Microsoft.Bing.Speech/

When we are using the SpeechClient from this package, we are seeing the mentioned "Audio format could not be parsed" error when executing RecognizeAsync on the SpeechClient.

var speechInput = new SpeechInput(producerConsumerStream,
new RequestMetadata(Guid.NewGuid(), new DeviceMetadata(DeviceType.Near,
DeviceFamily.Desktop, NetworkType.Ethernet, OsName.Windows, "Azure",
"Microsoft", "Current"), new ApplicationMetadata("App", "1.0"), "Speech"));
await _speechClient.RecognizeAsync(speechInput, new CancellationToken());

The last line throws the error. We assume that this is because our PCM stream does not have a WAVE/RIFF header since it is streaming. For the streaming scenario the DataRecognitionClient had the "SendAudioFormat" method.

Does SpeechClient not support a streaming scenario?

larsbeck
  • 665
  • 2
  • 7
  • 11

1 Answers1

1

Answering my own question. We have solved the issue by prepending a WAVE header with a fake total number of samples (aka length) to the stream, see: Create valid wav file header for streams in memory

Community
  • 1
  • 1
larsbeck
  • 665
  • 2
  • 7
  • 11
  • thanks a lot. I want to add that you need to set the position of your stream back to 0. When you write the header into the stream the position of the stream is 44 so the recognition client will still throw the same exception. – sk2andy Aug 05 '18 at 08:05