I want to perform real-time speech recognition for the Hololens 2 with Unity 2021 and I am using the Microsoft Azure Cognitive Services Speech SDK to do so. Instead of the default Hololens 2 microphone stream, I want to switch to the Stream Category "room capture", for which I must use the Windows Microphone Stream (see link). The Windows Microphone Stream initialization and starting also succeeds with this code:
//create windows mic stream
micStream = new WindowsMicrophoneStream();
if (micStream == null)
{
Debug.Log("Failed to create the Windows Microphone Stream object");
}
//init windows mic stream
WindowsMicrophoneStreamErrorCode result = micStream.Initialize(streamType);
if (result != WindowsMicrophoneStreamErrorCode.Success)
{
Debug.Log($"Failed to initialize the microphone stream. {result}");
return;
}
else Debug.Log($"Initialized the microphone stream. {result}");
// Start the microphone stream.
result = micStream.StartStream(true, false);
if (result != WindowsMicrophoneStreamErrorCode.Success)
{
Debug.Log($"Failed to start the microphone stream. {result}");
}
else Debug.Log($"Started the microphone stream. {result}");
I don't really have much knowledge concerning audio streams, but I guess for the Speech SDK to get the room capture, I have to feed it with this mic stream. My problem is that I have not found a way to do that. I guess that I would probably have to implement my own PullAudioInputStreamCallback class (as e.g. here), but I don't know how Read() should be implemented for the Windows Microphone Stream. Additionally, I considered to use a PushStream like so:
SpeechConfig speechConfig = SpeechConfig.FromSubscription(SpeechController.Instance.SpeechServiceAPIKey, SpeechController.Instance.SpeechServiceRegion);
speechConfig.SpeechRecognitionLanguage = fromLanguage;
using (var pushStream = AudioInputStream.CreatePushStream())
{
using (var audioInput = AudioConfig.FromStreamInput(pushStream))
{
using (var recognizer = new SpeechRecognizer(speechConfig, audioInput))
{
recognizer.Recognizing += RecognizingHandler;
...
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
// The "MicStreamReader" is not implemented!
using (MicStreamReader reader = new MicStreamReader(MicStream))
{
byte[] buffer = new byte[1000];
while (true)
{
var readSamples = reader.Read(buffer, (uint)buffer.Length);
if (readSamples == 0)
{
break;
}
pushStream.Write(buffer, readSamples);
}
}
pushStream.Close();
}
}
}
But I would need something like a "MicStreamReader" in this code. Could you help me with this approach or do you know a better one?