I'm making a program that uses Windows Speech Recognition to listen out for commands and I am using the Speech Synthesizer to provide real-time feedback. I was wondering whether it would be possible to use the result from the synthesizer to create an audio wave (similar to what you would see in something like Audacity when you record your voice), that would be displayed in real-time as the synthesizer continues to speak. I am trying to give the effect of being able to 'see' the program talk, not just hear it. I have no idea where to start and any advice/help will be greatly appreciated.
-
An answer to this question would require a tutorial or book or extremely long answer, which is all off-topic here. Which concepts do you use already? Do you use a library? How familiar working with devices/drivers? Which Microsoft API are you using for speech synthesis? I once worked with MSSAPI and that could e.g. save a WAV file instead of speaking. You could then play and display the WAV file. Perhaps there are better ways today, so let people know what you're using and what you have tried already. – Thomas Weller Sep 18 '15 at 22:39
1 Answers
From Windows Vista on you can capture the audio buffer of the current audio session via the:
Now WASAPI isn't great to be called by managed applications. You might need to PInvoke the funktions. But you are in luck! There is a managed library wraping that API:
It provides a number of usefull objects to play around with audio buffers and streams. You can load the package into your project via nuget
To create a Stream for capturing the live audio buffer you'd need to do something like this:
using (WasapiCapture capture = new WasapiLoopbackCapture()) {
capture.Initialize();
using(MemoryStream mstr = new MemoryStream())
using (WaveWriter wvWriter = new WaveWriter(mstr, capture.WaveFormat)) {
capture.DataAvailable +=
(object sender, DataAvailableEventArgs e) => {
wvWriter.Write(e.Data, e.Offset, e.ByteCount);
// Do some stuff with that Data!
}
}
}
To learn how to create a WaveForm of the data you pump into the stream you might want to check out some tutorials. (Hint: Ask Google)
To get you on the way, have a look at this stackoverflow question or this CodeProject article
Also note that most tutorials cover how to create wave forms of standard 44.1 kHz 16bit stereo PCM audio format.
Windows likes to buffer it's audio as a 88 kHz 32bit IEEE_FLOAT stereo PCM audio format. Which means you'll have 88,000 32bit samples a second to prcoess that will correspond to 2 channels and have float
values ranging from 0.0 to 1.0. (instead of -32k to +32k integer
values)
Windows does this internally as, floating point samples's are better for mixing different audio sources.