I am doing audio analysis in Python. My end goal is to get a list of frequencies and their respective volumes, like { frequency : volume (0.0 - 1.0) }
.
I have my audio data as a list of frames with values between -1.0
and +1.0
. I used numpy's fourier transform on this list — numpy.fftpack.fft()
. But the resulting data makes no sense to me.
I do understand that the fourier transform transforms from the time to the frequency domain, but not quite how it mathematically works. That's why I don't quite understand the results.
- What do the values in the list that
numpy.fftpack.fft()
returns mean? How do I work with it/interpret it? - What would be the max/min values of the fourier transform performed on a list as described above be?
- How can I get to my end goal of a dictionary in the form
{ frequency : volume (0.0 - 1.0) }
?
Thank you. Sorry if my lack of understanding of the fourier transform made you facepalm.