0

I have a 9 minutes files which the sampling rate of that is 16000. My signal has totally 9*60*16000= 8640000 samples. I am doing a stft in python (librosa package) and plotting spectrogram. I know the frequency resolution of spectrogram is equal to Fs(Sampling frequency)/N (Number of FFT point).

If I plot the spectrogram with N-fft= 2048, then my spectrogram has a shape of (1025, 16876) and the x_axis shows 9 minutes for time. If plot it with N-fft= 16384, then my spectrogram has a shape of (8193, 2110) and the x_axis shows 1 minutes and 10 seconds for time. I do not understand the relationship between the shape of the spectrogram and the time which we see in the time axis. Also I want to know the relationship between the time in spectrogram axis and actual time in the signal.

file = ('mm.wav')
k=1
v, sr = librosa.load( file, sr=16000)
t, phase = librosa.magphase(librosa.stft(v, n_fft= 2048))
librosa.display.specshow(librosa.power_to_db(t,ref=np.max),y_axis='linear',x_axis='time',sr=sr)
t.shape
fig.savefig ('2048.png')

spectrogram for n-fft=16384

Spectrogram for n-fft=2048

Zahra
  • 43
  • 8

3 Answers3

0

In general, a spectrogram is multiple (possibly overlapping) STFTs and the time in the plot is proportional to the time in the signal. Your issue looks like it can be solved with scaling the x-axis up by a factor of 8 (N-fft/2048), though off the top of my head I don't know exactly why.

joshwilsonvu
  • 2,569
  • 9
  • 20
0

the frequency resolution of the frequency bins is

freq resolution per bin = ( sampling_freq ) / number_of_samples

be aware of the two sided frequency plot which will be a mirror on both sides of the

Nyquist_Limit = (sampling_freq) / 2

since mirrored values are matched truncate at this limit and simply fold over values to effectively double values to the left

Scott Stensland
  • 26,870
  • 12
  • 93
  • 104
  • I understand the freq resolution. What I do not know is about "time frame". I do not know what is the number of time frame related to and how we can calculate it? – Zahra Jul 17 '19 at 16:15
0

In seconds, the Length of each time bin = FFT LENGTH / SAMPLE RATE. So in your first example, each time bin is (2048/16000) = 0.128 seconds; your audio is 540 seconds, so if the FFTs didn't overlap you would have a total number of FFT bins = audio length / bin length = (540 sec / 0.128 sec per bin) = 4218.75 bins in clip. Now, just make one small correction for the overlap of subsequent FFTs: it looks like you have 25% fft overlap, so it requires 4x as many bins to cover the whole audio: 4218.75 bins becomes around 16875 bins. (maybe there's a +1)