(How) Can I get a stream of all sounds recorded from the microphone that my computer did not produce? (using PulseAudio or something else)

Question

I've been playing around with some speech-to-text and text-to-speech systems, and am running into the problem that when the computer makes sounds that it can recognize, it starts taking commands from itself. To avoid this, I'd like a stream of all sounds picked up by the microphone that were not produced by the computer itself.

I see that PulseAudio has an echo cancellation module, but so far I have been unable to distinguish between its output and the raw microphone output: it still contains all the sounds picked up by the microphone that came from the computer speakers. I wonder if the default echo canceller is doing the opposite of what I want (i.e., it removes sounds heard by the microphone from being sent to the speakers).

Any idea how I can do this (preferably with pacmd)? I have thoroughly confused myself trying to specify non-default sources for the echo canceller, and have wandered into loopback modules and other things that are probably irrelevant. I know very little about PulseAudio, haven't found a good introduction to it (I've looked through much of the PulseAudio documentation but didn't see anything relevant), and might just be missing something simple. I feel frustrated that echo cancellation apparently doesn't work, I can't find documentation on it, and I can't find examples of it working from other people.

Thanks in advance for the help!

Other details that might be relevant: I'm running Ubuntu Saucy on a Lenovo Thinkpad T410. I'm using the built-in microphone and speakers (so, I'm pretty sure they're using the same sound card and I won't have clock drift issues). My actual application gets its sound through GStreamer, but GStreamer gets it from PulseAudio, and I don't think GStreamer itself has AEC capabilities. If there's a different way of doing this, I'd gladly switch to that.

Maybe try to mute voice frequency range with equalizer for computer sounds? — victor1234, Mar 29 '14 at 15:36
The overall goal is for me to speak to the computer and for it to speak back to me. If this only works when I'm already close enough to use headphones, I might as well just use the keyboard. Similarly, removing the voice frequency range from the microphone will prevent it from hearing me (or perhaps you're suggesting that I mute voice frequencies in the output, which would mess up lots of other sounds). — Alan, Mar 29 '14 at 16:04

Alan · Accepted Answer · 2014-03-31T13:18:27.560

Ah, I've got it! Merely loading the echo cancellation plugin isn't enough; you then need to start using it. In particular, it will only cancel echos of sounds passed into it, and if no sounds go through it, nothing will be cancelled. So, open /etc/pulse/default.pa and add the line

load-module module-echo-cancel

towards the bottom (I put it right after the line that loads module-filter-apply). Then, restart the PulseAudio daemon by running (as a non-root user) pulseaudio -k. Next, run pacmd to get a command line interface to PulseAudio, and give it the commands list-sources and list-sinks. Note the indices of the echo canceller in the responses. Edit /etc/pulse/default.pa again, and uncomment the two lines at the end about set-default, replacing the words input and output with the indices of the echo canceller's source and sink. Finally, restart PulseAudio again with pulseaudio -k (again, run as a non-root user).

Now, by default all sounds to be output get sent through the echo canceller before heading to the speakers, and all sounds to be input get pulled from the echo canceller after coming in through the microphone, and things actually work. You can verify that it's working by running pavucontrol and looking at the sound levels on the Input Devices screen (try playing some music and speaking, and note that the echo cancelled input shows normal sound levels when you speak but very low levels (verging on nothing) when you're silent but the music is playing).

This answer mostly comes from this post, which I wish I'd found weeks ago.

(How) Can I get a stream of all sounds recorded from the microphone that my computer did not produce? (using PulseAudio or something else)

1 Answers1