I've been playing around with some speech-to-text and text-to-speech systems, and am running into the problem that when the computer makes sounds that it can recognize, it starts taking commands from itself. To avoid this, I'd like a stream of all sounds picked up by the microphone that were not produced by the computer itself.
I see that PulseAudio has an echo cancellation module, but so far I have been unable to distinguish between its output and the raw microphone output: it still contains all the sounds picked up by the microphone that came from the computer speakers. I wonder if the default echo canceller is doing the opposite of what I want (i.e., it removes sounds heard by the microphone from being sent to the speakers).
Any idea how I can do this (preferably with pacmd
)? I have thoroughly confused myself trying to specify non-default sources for the echo canceller, and have wandered into loopback modules and other things that are probably irrelevant. I know very little about PulseAudio, haven't found a good introduction to it (I've looked through much of the PulseAudio documentation but didn't see anything relevant), and might just be missing something simple. I feel frustrated that echo cancellation apparently doesn't work, I can't find documentation on it, and I can't find examples of it working from other people.
Thanks in advance for the help!
Other details that might be relevant: I'm running Ubuntu Saucy on a Lenovo Thinkpad T410. I'm using the built-in microphone and speakers (so, I'm pretty sure they're using the same sound card and I won't have clock drift issues). My actual application gets its sound through GStreamer, but GStreamer gets it from PulseAudio, and I don't think GStreamer itself has AEC capabilities. If there's a different way of doing this, I'd gladly switch to that.