-2

I was under the impression that Mel-spectrograms were simply spectrograms with mel scale as the y axis. However, recently, I read in a research paper this line "Data representations such as Mel-Spectrograms can be seen from two different perspectives: either as an image, or as an audio sequence." What does this mean? It implies Mel-spectrograms are not just spectrograms, but can be interpreted in another way. If so, what is it exactly, and how can it be applied?

cchoi1022
  • 40
  • 4

1 Answers1

0

Spectrograms are 2-dimensional data, with the axes being Time and Frequency. There is 1 channel, which is the Energy/Power at a given Time-Frequency bin.

Images are also 2-dimensional data, where the axes are spatial extent (X/Y). If the image is grayscale, it also has just 1 channel.

Since many signal processing approaches does particularly care about the meaning of the axes, one can use many image processing techniques on spectrograms, and it can be quite useful.

There is however, nothing Mel specific about this. It applies the same with a linear/STFT spectrogram, a Chromagram or any other Time-Frequency representation.

Jon Nordby
  • 5,494
  • 1
  • 21
  • 50