0

I am trying to analyze the frequencies of a song at certain points of time held inside an array.

I am using the scipy.signal.spectrogram function to generate those frequencies. the length of the song is 2:44, or 164 seconds, and the sampling rate of the scipy.wavfile read is 44100.

When I use spectrogram:

f, t, Sxx= signal.spectrogram(data[:, 1], sr)

The length of f is really small, 129 elements. t is longer, at 32322, but still a long shot away from the 7240320 sampling windows in the original wavfile.read.

(data[:, 1] is the right channel of the audio data)

2 Answers2

1

The length of the f the default nperseg of the stft 256 divided by 2 (only the positive side of the frequency scale) + 1 (frequency 0). The number of samples in time is achieved by

t.size = len(data[:, 1]) / nperseg * (1 + noverlap) 

where noverlap is 256/8=32.

Syscall
  • 19,327
  • 10
  • 37
  • 52
Gideon Kogan
  • 662
  • 4
  • 18
  • I figured it out like 3 weeks ago, but thanks for the answer! – Blahmastah May 10 '18 at 19:16
  • This answer seems very wrong. When I create a test spectrogram with `f, t, Sxx = spectrogram(np.ones(1000))` where the shape should be (129,4) based on `nperseg = 256` and `noverlap = 32` which are the defaults, the calculated shape for the time dimension using above is 128.90625 -> 128 while the actual time shape is 4. Using the equation from below, I get the correct answer of 4. – ZachS Sep 17 '21 at 01:29
  • I was referring his question. Obviously, it is not the most general case. Mind that your example is not the usual case too, since your signal is not a multiplication of power of 2. – Gideon Kogan Sep 19 '21 at 10:56
  • Your equation does not even work for his question. I can create a dummy array with `data = np.zeros(7240320)` which is the size of the vector that he had in his question. Using his same code, I get `t.shape = (32322, )` and `f.shape = (129, 0)` which matches the shapes in his question. Now, if I use your equation, I get `len(data) / 256 * (1 + 32) = 933322.5`. If I use the equation from the answer above, I get `int((len(data) - 32) / (256 - 32)) = 32322` which matches the output of the spectrogram. I would highly recommend that you actually run the code to check your answer. – ZachS Sep 23 '21 at 22:32
1

The frequency array f is limited by half of nperseg plus the zero frequency, so

f.size = int(1 + nperseg / 2)

while the time array is limited by the amount of segments you can extract from the data array based on nperseg and noverlap, like so

t.size = int(len(data[:, 1]) - noverlap) / (nperseg - noverlap))

It's easier to understand this if you imagine that to have two segments with nperseg=8 and noverlap=1 you need a signal with 15 samples at least.

ZachS
  • 146
  • 5
Bruno
  • 73
  • 1
  • 10
  • That's considering `return_onesided=True`, which is the default. – Bruno Sep 08 '19 at 12:11
  • Note than the size of the frequency vector should be `int(1 + nperseg / 2)` like the Gideon Kogan's answer. In the given answer, frequency size is said to be related to sample frequency, but sample frequency does not play a role in the size of the vector, just the mapping of the vector onto frequencies. You can run the example `f, t, Sxx= signal.spectrogram(np.ones(7240320) , sr)` with `sr = 44100` and `sr = 1` and see that the size of f does not change. – ZachS Sep 23 '21 at 22:45