1

I'm trying to understand the output of librosa.feature.melspectrogram:

>>> import numpy as np
>>> from librosa.feature import melspectrogram
>>>
>>>
>>> melspectrogram(np.random.randn(128), n_mels=128).shape
(128, 1)
>>> melspectrogram(np.random.randn(900), n_mels=128).shape
(128, 2)
>>> melspectrogram(np.random.randn(500), n_mels=128).shape
(128, 1)
>>> melspectrogram(np.random.randn(512), n_mels=128).shape
(128, 2)
>>> melspectrogram(np.random.randn(511), n_mels=128).shape
(128, 1)
>>> melspectrogram(np.random.randn(1023), n_mels=128).shape
(128, 2)
>>> melspectrogram(np.random.randn(1024), n_mels=128).shape
(128, 3)
>>> melspectrogram(np.random.randn(2055), n_mels=128).shape
(128, 5)
>>> melspectrogram(np.random.randn(2047), n_mels=128).shape
(128, 4)

What determines the second value of its shape? The first one is clear, it's n_mels, but from the docs I can't understand where the second one comes from.

ignoring_gravity
  • 6,677
  • 4
  • 32
  • 65
  • It's the length of the signal in *frames* (not samples), depending on window and hop length. See [this answer](https://stackoverflow.com/a/62733609/942774) for more details. – Hendrik Aug 04 '20 at 06:30
  • yes, makes sense, thanks, it's `1 + len(y) // hop_length` – ignoring_gravity Aug 04 '20 at 07:04

1 Answers1

1

It's the length of the signal in frames (not samples), depending on window and hop length. See this answer.

Concretely: 1 + len(y) // hop_length

Hendrik
  • 5,085
  • 24
  • 56