How to implement/perform DFT on a segment in python?

Question

I am trying to write a simple program in python that will calculate and display DFT output of 1 segment.

My signal is 3 seconds long, I want to calculate DFT for every 10ms long segment. Sampling rate is 44100. So one segment is 441 samples long.

Since I am in the phase of testing this and original program is much larger(speech recognition) here is an isolated part for testing purposes that unfortunately behaves odd. Either that or my lack of knowledge on the subject.

I read somewhere that DFT input should be rounded to power of 2 so I arranged my array to 512 instead 441. Is this true?
If I am sampling at a rate of 44100, at most I can reach frequency of 22050Hz and for sample of length 512(~441) at least 100Hz ?
If 2. is true, then I can have all frequencies between 100hz and 22050hz in that 10ms segments, but the length of segment is 512(441) samples only, output of fft returns array of 256(220) values, they cannot contain all 21950 frequencies in there, can they?
My first guess is that the values in output of fft should be multiplied by 100, since 10ms is 100th of a second. Is this good reasoning?

The following program for two given frequencies 1000 and 2000 returns two spikes on graph at positions 24 and 48 in the output array and ~2071 and ~4156 on the graph. Since ratio of numbers is okay (2000:1000 = 48:24) I wonder if I should ignore some starting part of the fft output?

import matplotlib.pyplot as plt
import numpy as np

t = np.arange(0, 1, 1/512.0)  # We create 512 long array

# We calculate here two sinusoids together at 1000hz and 2000hz
y = np.sin(2*np.pi*1000*t) + np.sin(2*np.pi*2000*t)
n = len(y)
k = np.arange(n)

# Problematic part is around here, I am not quite sure what
# should be on the horizontal line
T = n/44100.0
frq = k/T
frq = frq[range(n/2)]


Y = fft(y)  
Y = Y[range(n/2)]
# Convert from complex numbers to magnitudes
iY = []
for f in Y:
    iY.append(np.sqrt(f.imag * f.imag + f.real * f.real))


plt.plot(frq, iY,  'r')
plt.xlabel('freq (HZ)')
plt.show()

SleuthEye · Accepted Answer · 2015-12-21T03:46:29.437

I read somewhere that the DFT input should be rounded to power of 2 so I arranged my array to 512 instead 441. Is this true?

The DFT is defined for all sizes. However, implementations of the DFT such as the FFT are generally much more efficient for sizes which can be factored in small primes. Some library implementations have limitations and do not support sizes other than powers of 2, but that isn't the case with numpy.

If I am sampling at a rate of 44100, at most I can reach frequency of 22050Hz and for sample of length 512(~441) at least 100Hz?

The highest frequency for even sized DFT will be 44100/2 = 22050Hz as you've correctly pointed out. Note that for odd sized DFT the highest frequency bin will correspond to a frequency slightly less than the Nyquist frequency. As for the minimum frequency, it will always be 0Hz. The next non-zero frequency will be 44100.0/N where N is the DFT length in samples (which gives 100Hz if you are using a DFT length of 441 samples and ~86Hz with a DFT length of 512 samples).

If 2) is true, then I can have all frequencies between 100Hz and 22050Hz in that 10ms segments, but the length of segment is 512(441) samples only, output of fft returns array of 256(220) values, they cannot contain all 21950 frequencies in there, can they?

First there aren't 21950 frequencies between 100Hz and 22050Hz since frequencies are continuous and not limited to integer frequencies. That said, you are correct in your realization that the output of the DFT will be limited to a much smaller set of frequencies. More specifically the DFT represents the frequency spectrum at discrete frequency step: 0, 44100/N, 2*44100/N, ...

My first guess is that the values in output of FFT should be multiplied by 100, since 10ms is 100th of a second. Is this good reasoning?

There is no need to multiply the FFT output by 100. But if you meant multiples of 100Hz with a DFT of length 441 and a sampling rate of 44100Hz, then your guess would be correct.

The following program for two given frequencies 1000 and 2000 returns two spikes on graph at positions 24 and 48 in the output array and ~2071 and ~4156 on the graph. Since ratio of numbers is okay (2000:1000 = 48:24) I wonder if I should ignore some starting part of the fft output?

Here the problem is more significant. As you declare the array

t = np.arange(0, 1, 1/512.0)  # We create 512 long array

you are in fact representing a signal with a sampling rate of 512Hz instead of 44100Hz. As a result the tones you are generating are severely aliased (to 24Hz and 48Hz respectively). This is further compounded by the fact that you then use a sampling rate of 44100Hz for the frequency axis conversion. This is why the peaks are not appearing at the expected 1000Hz and 2000Hz frequencies.

To represent 512 samples of a signal sampled at a rate of 44100Hz, you should instead use

t = np.arange(0, 511.0/44100, 1/44100.0)

at which point the formula you used for the frequency axis would be correct (since it is based of the same 44100Hz sampling rate). You should then be able to see peaks near the expected 1000Hz and 2000Hz (the closest frequency bins of the peaks being at ~1033Hz and 1981Hz).

score 1 · Answer 2 · answered Dec 21 '15 at 01:59

1) I read somewhere that DFT input should be rounded to power of 2 so I aranged my array to 512 instead 441. Is this true?

Yes, DFT length should be a power of two. Just pad the input with zero to match 512.

2) If I am sampling at a rate of 44100, at most I can reach frequency of 22050hz and for sample of length 512(~441) at least 100hz ?

Yes, the highest frequency you can get is half the the sampling rate, It's called the Nyquist frequency.

No, the lowest frequency bin you get (the first bin of the DFT) is called the DC component and marks the average of the signal. The next lowest frequency bin in your case is 22050 / 256 = 86Hz, and then 172Hz, 258Hz, and so on until 22050Hz. You can get this freqs with the numpy.fftfreq() function.

3) If 2) is true, then I can have all frequencies between 100hz and 22050hz in that 10ms segments, but the length of segment is 512(441) samples only, output of fft returns array of 256(220) values, they cannot contain all 21950 frequencies in there, can they?

DFT doesn't lose the original signal's data, but it lacks accuracy when the DFT size is small. You may zero-pad it to make the DFT size larger, such as 1024 or 2048.

The DFT bin refers to a frequency range centered at each of the N output points. The width of the bin is sample rate/2, and it extends from: center frequency -(sample rate/N)/2 to center frequency +(sample rate/N)/2. In other words, half of the bin extends below each of the N output points, and half above it.

4) My first guess is that the values in output of fft should be multiplied by 100, since 10ms is 100th of a second. Is this good reasoning?

No, The value should not be multiplied if you want to preserve the magnitude.

The following program for two given frequencies 1000 and 2000 returns two spikes on graph at positions 24 and 48 in the output array and ~2071 and ~4156 on the graph. Since ratio of numbers is okay (2000:1000 = 48:24) I wonder if I should ignore some starting part of the fft output?

The DFT result is mirrored in real input. In other words, your frequencies will be like this:

n  0   1   2   3    4   ... 255   256   257   ... 511 512
Hz DC  86  172 258  344 ... 21964 22050 21964 ... 86  0

How to implement/perform DFT on a segment in python?

2 Answers2