Python NumPy - FFT and Inverse FFT?

Question

I've been working with FFT, and I'm currently trying to get a sound waveform from a file with FFT, (modify it eventually), but then output that modified waveform back to a file. I've gotten the FFT of the soundwave and then used an inverse FFT function on it, but the output file doesn't sound right at all. I haven't done any filtering on the waveform - I'm just testing out getting the frequency data and then putting it back into a file - it should sound the same, but it sounds wildly different.

I have since been working on this project a bit, but haven't yet gotten desired results. The outputted sound file is noisy (both more loud, as well as extra noise that wasn't present in the original file), and sound from one channel leaked into the other channel (which was previously silent). The input sound file is a stereo, 2-channel file with sound only coming from one channel. Here's my code:

import scipy
import wave
import struct
import numpy
import pylab

from scipy.io import wavfile

rate, data = wavfile.read('./TriLeftChannel.wav')

filtereddata = numpy.fft.rfft(data, axis=0)
print(data)

filteredwrite = numpy.fft.irfft(filtereddata, axis=0)
print(filteredwrite)

wavfile.write('TestFiltered.wav', rate, filteredwrite)

I don't quite see why this doesn't work.

I've zipped up the problem .py file and audio file, if that can help solve the issue here.

try adding `filteredwrite = numpy.round(filteredwrite).astype('int16')` before you save — Bi Rico, Apr 26 '12 at 15:50
@Bago - Thanks a lot! That totally fixed the problem. I was wondering, does forcing the filtered ifft to 'int16' mean that it will be a 16-bit depth sound file? — SolarLune, Apr 27 '12 at 02:00
I don't know too much about wav files, I always assumed they were raw, uncompressed data, but you'll have to read up on the wav format specs to know for sure. — Bi Rico, Apr 27 '12 at 17:27

score 7 · Answer 1 · answered Apr 19 '12 at 06:39

7

You don't appear to be applying any filter here
You probably want to take the ifft of the fft (post-filtering), not of the input waveform.

answered Apr 19 '12 at 06:39

wim

338,267
99
616
750

score 5 · Answer 2 · answered Apr 19 '12 at 06:49

5

Shouldn't it be more like this?

filtereddata = numpy.fft.fft(data)
# do fft stuff to filtereddata
filteredwrite = numpy.fft.ifft(filtereddata)
wavfile.write('TestFiltered.wav', rate, filteredwrite)

answered Apr 19 '12 at 06:49

John La Rooy

295,403
53
369
502

@wim - Sorry about that - edited my original post to have more info. – SolarLune Apr 19 '12 at 21:34

score 5 · Accepted Answer · answered Apr 25 '12 at 01:37

>>> import numpy as np
>>> a = np.vstack([np.ones(11), np.arange(11)])

# We have two channels along axis 0, the signals are along axis 1
>>> a
array([[  1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.],
       [  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.]])
>>> np.fft.irfft(np.fft.rfft(a, axis=1), axis=1)
array([[  1.1       ,   1.1       ,   1.1       ,   1.1       ,
          1.1       ,   1.1       ,   1.1       ,   1.1       ,
          1.1       ,   1.1       ],
       [  0.55      ,   1.01836542,   2.51904294,   3.57565618,
          4.86463721,   6.05      ,   7.23536279,   8.52434382,
          9.58095706,  11.08163458]])
# irfft returns an even number along axis=1, even though a was (2, 11)

# When a is even along axis 1, we get a back after the irfft.
>>> a = np.vstack([np.ones(10), np.arange(10)])
>>> np.fft.irfft(np.fft.rfft(a, axis=1), axis=1)
array([[  1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
          1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
          1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
          1.00000000e+00],
       [  7.10542736e-16,   1.00000000e+00,   2.00000000e+00,
          3.00000000e+00,   4.00000000e+00,   5.00000000e+00,
          6.00000000e+00,   7.00000000e+00,   8.00000000e+00,
          9.00000000e+00]])

# It seems like you signals are along axis 0, here is an example where the signals are on axis 0
>>> a = np.vstack([np.ones(10), np.arange(10)]).T
>>> a
array([[ 1.,  0.],
       [ 1.,  1.],
       [ 1.,  2.],
       [ 1.,  3.],
       [ 1.,  4.],
       [ 1.,  5.],
       [ 1.,  6.],
       [ 1.,  7.],
       [ 1.,  8.],
       [ 1.,  9.]])
>>> np.fft.irfft(np.fft.rfft(a, axis=0), axis=0)
array([[  1.00000000e+00,   7.10542736e-16],
       [  1.00000000e+00,   1.00000000e+00],
       [  1.00000000e+00,   2.00000000e+00],
       [  1.00000000e+00,   3.00000000e+00],
       [  1.00000000e+00,   4.00000000e+00],
       [  1.00000000e+00,   5.00000000e+00],
       [  1.00000000e+00,   6.00000000e+00],
       [  1.00000000e+00,   7.00000000e+00],
       [  1.00000000e+00,   8.00000000e+00],
       [  1.00000000e+00,   9.00000000e+00]])

score 2 · Answer 4 · answered Apr 19 '12 at 22:50

2

Two problems.

You are FFTing 2 channel data. You should only FFT 1 channel of mono data for the FFT results to make ordinary sense. If you want to process 2 channels of stereo data, you should IFFT(FFT()) each channel separately.

You are using a real fft, which throws away information, and thus makes the fft non-invertible.

If you want to invert, you will need to use an FFT which produces a complex result, and then IFFT this complex frequency domain vector back to the time domain. If you modify the frequency domain vector, make sure it stays conjugate symmetric if you want a strictly real result (minus numerical noise).

answered Apr 19 '12 at 22:50

hotpaw2

70,107
14
90
153

you can fft multi-channel data, you just need to use a 2d array and make sure the axis keyword is set correctly (-1 by default), and `irfft(rfft(n))` should return n (within machine precision). – Bi Rico Apr 19 '12 at 23:20
* `irfft(rfft(n))` seems to be best behaved if n.shape[axis] is even. – Bi Rico Apr 19 '12 at 23:37
@Bago - Sorry about taking so long to go about this, but could you expand a bit on what you mean? What do you mean by 'use a 2d array'? You mean a NumPy array, right? – SolarLune Apr 25 '12 at 01:22
@Bago - I think I'm doing it correctly, but it's difficult to tell. The array is being read in with wavfile.read('./TriLeftChannel.wav'), but the shape is (x, 2), with x being a high number of samples. So, it's already a 2 dimensional array. I'm specifying the axis when I use FFT and IFFT, but it didn't change the output... – SolarLune Apr 25 '12 at 01:36
@SolarLune, In cases like this a short example would really help. I think you'll find that if you give enough info on SO to help others reproduce your problem you'll get a lot more feedback. – Bi Rico Apr 25 '12 at 01:41
@Bago - You're right. I've posted my problem code above, and made a zip with the .py file and my audio file, in case that's helpful. – SolarLune Apr 26 '12 at 07:02

Python NumPy - FFT and Inverse FFT?

4 Answers4

Linked