Given a wav file (mono 16KHz sampling rate) of an audio recording of a human talking, is there a way to extract just the voice, thereby filtering out most mechanical and background noise? I'm trying to use librosa
package in Python 3.6 for this, but can't figure out how piptrack
works (or if there is a simpler way).
When tried using an fft/ifft to restrict frequencies to 300-3400 range, the resulting sound was severely distorted.
sr, y = scipy.io.wavfile.read(wav_file_path)
x = np.fft.rfft(y)[0:3400]
x[0:300] = 0
x = np.fft.irfft(x)