Numpy RFFT/IRFFT volume

Question

I'm doing an rfft and irfft from a wave file:

samplerate, data = wavfile.read(location)
input = data.T[0] # first track of audio
fftData = np.fft.rfft(input[sample:], length)
output = np.fft.irfft(fftData).astype(data.dtype)

So it reads from a file and then does rfft. However it produces a lot of noise when I play the audio with py audio stream. I tried to search an answer to this question and used this solution:

rfft or irfft increasing wav file volume in python

That is why I have the .astype(data.dtype) when doing the irfft. However it doesn't reduce the noise, it reduced it a bit but still it sounds all wrong.

This is the playback, where p is the pyAudio:

stream = p.open(format=pyaudio.paFloat32,
                channels=1,
                rate=fs,
                output=True)

stream.write(output)    
stream.stop_stream()
stream.close()    
p.terminate()

So what am I doing wrong here?

Thanks!

edit: Also I tried to use .astype(dtype=np.float32) when doing the irfft as the pyaudio uses that when streaming audio. However it was still noisy.

I think you need to divide your output by `len(fftData)' to account for the correct normalization of the transform. — rammelmueller, Mar 13 '18 at 13:43
I don't understand, why should it be devided by the length of the fftData? But yes it seems to be a normalization issue. — bnc, Mar 14 '18 at 10:33
You should try to find out if this is a problem with the FFT or a problem with PyAudio, and ask your question about just one of them. You could try to save the result of the FFT/IFFT to a file and analyze it with an external program. If that looks fine, you could try to play some "correct" sound with PyAudio, to see if you are using it right. — Matthias, Mar 15 '18 at 09:32

bnc · Answer 1 · 2018-03-14T11:05:28.000

The best working solution this far seems to be normalization with median value and using .astype(np.float32) as pyAudio output is float32:

samplerate, data = wavfile.read(location)
input = data.T[0] # first track of audio
fftData = np.fft.rfft(input[sample:], length)
fftData = np.divide(fftData, np.median(fftData))
output = np.fft.irfft(fftData).astype(dtype=np.float32)

If anyone has better solutions I'd like to hear. I tried with mean normalization but it still resulted in clipping audio, normalization with np.max made the whole audio too low. This normalization problem with FFT is always giving me trouble and haven't found any 100% working solutions here in SO.

Numpy RFFT/IRFFT volume

1 Answers1