How to Make Sense of Fourier Transform Results in Audio Frequency Analysis

Question

I am doing audio analysis in Python. My end goal is to get a list of frequencies and their respective volumes, like { frequency : volume (0.0 - 1.0) }.

I have my audio data as a list of frames with values between -1.0 and +1.0. I used numpy's fourier transform on this list — numpy.fftpack.fft(). But the resulting data makes no sense to me.

I do understand that the fourier transform transforms from the time to the frequency domain, but not quite how it mathematically works. That's why I don't quite understand the results.

What do the values in the list that numpy.fftpack.fft() returns mean? How do I work with it/interpret it?
What would be the max/min values of the fourier transform performed on a list as described above be?
How can I get to my end goal of a dictionary in the form { frequency : volume (0.0 - 1.0) }?

Thank you. Sorry if my lack of understanding of the fourier transform made you facepalm.

score 4 · Accepted Answer · answered May 12 '14 at 05:24

Consider the FFT of a single period of a sine wave:

>>> t = np.linspace(0, 2*np.pi, 100)
>>> x = np.sin(t)
>>> f = np.fft.rfft(x)
>>> np.round(np.abs(f), 0)
array([  0.,  50.,   1.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   0.])

The FFT returns an array of complex numbers which give the amplitude and phase of the frequencies. Assuming you're only interested in the amplitude, I've used np.abs to get the magnitude for each frequency and rounded it to the nearest integer using np.round(__, 0). You can see the spike at index 1 indicating a sin wave with period equal to the number of samples was found.

Now make the wave a bit more complex

>>> x = np.sin(t) + np.sin(3*t) + np.sin(5*t)
>>> f = np.fft.rfft(x)
>>> np.round(np.abs(f), 0)
array([  0.,  50.,   1.,  50.,   0.,  48.,   4.,   2.,   2.,   1.,   1.,
         1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   0.])

We now see spikes at indicies 1, 3 & 5 corresponding to our input. Sine waves with periods of n, n/3 and n/5 (where n in the number of input samples).

EDIT

Here's a good conceptual explanation of the Fourier transform: http://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/

The phase part of the result is what confused me. Thanks for the explanation and link. — anroesti, May 12 '14 at 06:54
One more question, is the maximum value after the transform always half the magnitude of the input? — anroesti, May 12 '14 at 07:04

How to Make Sense of Fourier Transform Results in Audio Frequency Analysis

1 Answers1