Python script combines the two wav files instead of cancelling the common amplitudes between them

Question

We have designed a code that records two wav files: 1. Records the ambient noise 2. Records the ambient noise with voice

We are then going to use those two wav files as inputs for our third def function that will subtract the ambient noise wav file from the ambient noise with voice. The only problem is that when we run the script and call the combination() function, the resulting wav file combines the two preceding wav files. Our goal is to get an output where the ambient noise will be reduced and the voice will be the one heard louder than it. Here is our script below:

import pyaudio
import wave
import matplotlib.pyplot as plt
import numpy as np
import scipy.io.wavfile
import scipy.signal as sp

def ambient():
    FORMAT = pyaudio.paInt16
    CHANNELS = 2
    RATE = 44100
    CHUNK = 1024
    RECORD_SECONDS = 5
    WAVE_OUTPUT_FILENAME = "ambientnoise.wav"

    audio = pyaudio.PyAudio()

    # start Recording
    stream = audio.open(format=FORMAT, channels=CHANNELS,
                        rate=RATE, input=True,
                        frames_per_buffer=CHUNK)
    print ("recording...")
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    print ("finished recording")

        # stop Recording
    stream.stop_stream()
    stream.close()
    audio.terminate()

    waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    waveFile.setnchannels(CHANNELS)
    waveFile.setsampwidth(audio.get_sample_size(FORMAT))
    waveFile.setframerate(RATE)
    waveFile.writeframes(b''.join(frames))
    waveFile.close()

    x = scipy.io.wavfile.read('ambientnoise.wav')
    n = x[1]
    y = np.zeros(n.shape)
    y = n.cumsum(axis=0)

    times = np.linspace(0, len(n), len(n))
    plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
    plt.xlabel("n")
    plt.ylabel("$speech1.wav$")
    plt.plot(times,n)
    plt.show()

def voice():
    FORMAT = pyaudio.paInt16
    CHANNELS = 2
    RATE = 44100
    CHUNK = 1024
    RECORD_SECONDS = 5
    WAVE_OUTPUT_FILENAME = "ambientwithvoice.wav"

    audio = pyaudio.PyAudio()

    # start Recording
    stream = audio.open(format=FORMAT, channels=CHANNELS,
                        rate=RATE, input=True,
                        frames_per_buffer=CHUNK)
    print ("recording...")
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    print ("finished recording")

        # stop Recording
    stream.stop_stream()
    stream.close()
    audio.terminate()

    waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    waveFile.setnchannels(CHANNELS)
    waveFile.setsampwidth(audio.get_sample_size(FORMAT))
    waveFile.setframerate(RATE)
    waveFile.writeframes(b''.join(frames))
    waveFile.close()

    x = scipy.io.wavfile.read('ambientwithvoice.wav')
    n = x[1]
    y = np.zeros(n.shape)
    y = n.cumsum(axis=0)

    times = np.linspace(0, len(n), len(n))
    plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
    plt.xlabel("n")
    plt.ylabel("$speech1.wav$")
    plt.plot(times,n)
    plt.show()

def combination():
    rate1,Data1 = scipy.io.wavfile.read('ambientnoise.wav')
    rate2,Data2 = scipy.io.wavfile.read('ambientwithvoice.wav')
    new_Data = [0]*len(Data1)
    for i in range(0,len(Data1)):
        new_Data[i] = Data2[i] + Data1[i]
    new_Data = np.array(new_Data)
    scipy.io.wavfile.write('filtered.wav', rate1, new_Data)

    x = scipy.io.wavfile.read('ambientwithvoice.wav')
    n = x[1]
    y = np.zeros(n.shape)
    y = n.cumsum(axis=0)

    times = np.linspace(0, len(n), len(n))
    plt.title("Plot 261 $speech1.wav\n $Secades, M.F.\spadesuit SIGNLAB \spadesuit 6Feb2018$")
    plt.xlabel("n")
    plt.ylabel("$speech1.wav$")
    plt.plot(times,n)
    plt.show()

Have you considered using `-` instead of `+` in `Data2[i] + Data1[i]`? — MB-F, May 15 '18 at 07:38
@kazemakase yes, but it led to the same output, which is very strange and confusing at the same time. I even tried inverting one of the wav files and then combining it afterwards, but it also let to the same results. :( — Markus, May 15 '18 at 07:48
I see... The noise in both recordings is not exactly same, so it cannot cancel out (it is only *statistically* the same; if you are lucky). If it really is *noise* then simply adding/subtracting will [increase the noise level rather than reduce it](https://en.wikipedia.org/wiki/Variance#Sum_of_uncorrelated_variables_(Bienaym%C3%A9_formula)). You will need a different and much more complicated approach. I don't even know where to point you for a start. Try [Wikipedia](https://en.wikipedia.org/wiki/Noise_reduction); for ideas. Welcome to the world of statistical signal processing! :) — MB-F, May 15 '18 at 08:08
@kazemakase hahaha, alright I will try to research further. Still, I'm hoping someone can answer this inquiry of mine. — Markus, May 15 '18 at 08:28
As it stands now, I don't think anyone can really answer the inquiry. What exactly is the question you want answered? — MB-F, May 15 '18 at 08:49
@kazemakase I think for me, I just wanted to know the reason why my theory didn't work. Regardless of the operation and/or method that I did. It is really confusing for me and I just want someone to explain it to me thoroughly. Although you did more than enough to explain and I already understood it, so thanks for your explanation! :) — Markus, May 15 '18 at 14:06

score 1 · Answer 1 · answered May 16 '18 at 07:49

We have designed a code that records two wav files: 1. Records the ambient noise 2. Records the ambient noise with voice

This means, that while the ambient noise is continuously going on in the background two different recordings are made, one after the other. The first records only the noise, the second also has speech in it.

To simplify the explanation, let's assume the speech is not present (maybe the speaker simply said nothing). This should work similarly; noise from the first recording should be used to reduce the noise in the second recording - it does not matter if there is another signal present in the second recording or not. We know we were successful if the noise is reduced.

The situation looks like this:

Now let's combine the two recordings either by adding them or by subtracting:

Apparently, neither approach reduced the noise. Looking closely, the situation got worse: the noise amplitude in the resulting signal is higher than in either of the two recordings!

In order to work, the signal we subtract must be an exact replicate of noise in the speech signal (or at least a reasonable approximation). There lies the problem: we do not know the noise signal, because every time we record it looks differently.

So what can we do?

Use a second microphone that records the noise at the same time as the speech, but does not record the speaker.
Apply domain knowledge (#1): if you know for example that the noise is in a different frequency range than the speech signal filters can reduce the noise part.
Apply domain knowledge (#2): if the noise is predictable (e.g. something periodic like a fan or an engine) create a mathematical model that predicts the noise and subtract that from the speech signal.
If the noise is "real noise" (statistically independent and broad-band) such as Gaussian white-noise, we're pretty much out of luck.

Python script combines the two wav files instead of cancelling the common amplitudes between them

1 Answers1