2

Im using scipy.signal.stft to get the stft of an audio. No porblem with that and I'm getting the results. But what I don't understand is, when an audio of 169600 samples whose sample rate is fs=44100 Hz is used to calculate stft, I get 3 returns as f, t and Zxx. Here the shape of Zxx is (2049, 84).

For calculating stft I use a window of size 4096, and as the window type the hanning window is used. And by default, scipy.signal.stft uses a window_size // 2 overlap between frames.

My question: Is there 2049 ovelapping frames? if so or if not, how to calculate the number of overlapping frames in stft? if 2049 is not the number of ovelapping samples, what does that number mean?

Ashan Priyadarshana
  • 3,119
  • 3
  • 29
  • 34

1 Answers1

2

The FFT of a real-valued signal yields a spectrum with Hermitian symmetry. That means that the upper half of the spectrum can be obtained from the lower half. Also, when the FFT size N is even, the mid point is its own symmetry. As a result the spectrum is fully determined by N//2 + 1 frequency point (which is the size of spectrum returned by scipy.signal.stft). In your case N is 4096, so you get a spectrum of 4096//2 + 1 or 2049 points along the frequency axis. You should be able to confirm that f is indeed an array of 2049 frequency values (from 0 to 44100/2 Hz in 44100/4096 or ~10.77Hz increments).

As far at the number of time values is concerned you can compute it as

number_of_samples = 169600
number_time_values = (number_of_samples+window_size)//(window_size - window_size//2) 
%  = (169600 + 4096)//(4096 - 2048) 
%  = 84

The +window_size (+4096 in your case) term in the numerator is due to the boundary = 'zeros' default option which pads the input with zeros before and after your actual 169,600 input samples.

SleuthEye
  • 14,379
  • 2
  • 32
  • 61