Signal Processing: Can someone explain for me the different type of spectrogram?

Question

I'm newbie with signal processing and I search on Google many terminology of spectrogram but I can't find any thing talk about the difference of type of spectrogram. Can anyone help me to explain the definition and meaning of diffenrent spectrogram in the picture below plz? Thanks!

spectrogram

P/s: And what about the difference between spectrogram and chroma? What and when chroma use for?

chroma

mins · Accepted Answer · 2023-06-22T12:01:21.340

You asked to clarify two terms: Spectrogram and chroma.

Spectrogram is a visualization of the frequency spectrum, a breakdown of the sound into pure sinusoids of different frequencies. A spectrogram provides a view of how the amplitude of the different frequencies vary according to time. This can be shown on a 2D plot (alternatively a 3D plot) where x is used for time, y for frequency and a color denotes amplitude at any frequency component found in the sound:

^{Voice spectrogram, source}

In these plots, axes can be linear or logarithmic, and frequency axis can even be note names (sometimes referred to as pitch classes) instead of actual frequencies, as each note corresponds to a frequency. In this latter case the plot is rather called a chromagram. See section further below for details about plots used in audio analysis.
An octave is any range of frequency f to 2*f. Each octave can be divided into seven intervals, using 8 notes. For an octave starting at C: C, D, E, F, G, A, B, C. These degrees are called the (C major) diatonic scale, this is the scale we all learned at school:

An interval is measured as the ratio of the frequencies of the notes. Five intervals have the same value, a tone, and the two other, E-F and B-C, are only half of this value, a semitone. This division is found in all octaves, as doubling of halving the frequencies doesn't change the ratios. On a piano keyboard, these notes are the white keys.

There is another scale, which divides the octave into 12 equal intervals, using 13 notes. This scale is the chromatic scale, chroma just refers to these notes:

The notes composing the chromatic scale are the notes of the previous scale plus notes splitting all full tone intervals in two equal intervals of a semitone. On a keyboard, these notes are the black keys.

Music, except in rare cases, is not composed using the chromatic scale (all semitones). Instead a diatonic scale with more full tones than semitones is built from the pool of the chromatic notes, by selecting a starting note and a scheme for the intervals to be used. Today there are two interval schemes used: Major and minor. With 12 possible starting notes, there are 24 possible diatonic scales.

Chroma: Big word for a trivial concept

As seen above, chroma, chroma analysis and chroma feature sounds big business, there is nothing to worry about, chroma is the hype wording for saying note or pitch of the chromatic scale, the ordinary set of notes used in Western music.

Spectrogram

The spectrogram is a 3D representation, axis x is time, axis y is frequency and axis z is generally amplitude or power (power is generally the square of amplitude). Z value is indicated by the color of the pixel at grid point (x,y).

Any axis, x, y or z can be made logarithmic using decibels. For a power scale it corresponds to the transformation: dB = 10 log (P/P0), where P0 is a reference value, 1 unless otherwise specified. Doubling is +3dB. As power ratios are the square of amplitude ratios, the decibel value for amplitude is dB (amplitude) = 20 log (A/A0).

The graph below shows the power (z as gray scale) expressed in dB for the frequency y (Hz) at time x (x scale is not shown).

The same with gray shades replaced by colors:

The next graph is identical, except the y scale is logarithmic instead of linear, which makes more sense if energy is concentrated at the beginning of the scale (low frequencies), like here under 1 kHz:

This next graph is the same. From the title it seems power is shown instead of amplitude, but visually there is no color difference:

The next graph is similar, except the "constant Q" title likely means power values are computed using a constant-Q transform (CQT):

The CQT (instead of the usual discrete Fourier transforms) might be an attempt to extract more accurately the notes from the signal.

The same data are shown in the graph below, but y is labeled with note names instead of frequencies:

Chromagram

The chromagram is a specific spectrogram where the y axis and the z values are particular.

Scale y includes only the 12 notes of the chromatic scale.
Z value is the summation of all sounds which correspond to each notes, regardless of the octave, so C is the sum of C0 (C in octave 0), plus C1 (twice the frequency of C0), plus C2 (twice the frequency of C1), etc. These notes are all harmonics of C0.

You may wonder why the octaves are summed, loosing the actual frequency information. It's specific to musical sounds, that is sounds produced by resonant devices. When such device produces a sound of frequency f, it also produces sounds at multiples of f (harmonics at 2f, 3f, 4f) which individual strength is imposed by its musical timbre.

In addition as explained in the introductory section about music scales, when a piece of music is created, a scale is selected. This choice freezes the 7 notes which are used for the piece, whatever the octave used. The isolated use of foreign notes (accidentals) make them less frequent in the piece, and therefore less frequent in the chromagram.

A chromagram:

What the z axis does represent is not mentioned, possibly it's the amplitude (or power) relative to the maximum found in the signal (around note E).

The last graph is different in that the y axis doesn't show signal pitches but the tempo (beats per minute) of the sample.

Tempogram

The scale is logarithmic. The color indicates how much frequently this number of BPM is detected. More than one value of BPM is detected because there are several notes shorter than a time. The notes repeat at frequencies higher than the actual BPM. Usually the algorithm used to perform the analysis also provides the most probable BPM, taking onset distribution (e.g. librosa).

score 1 · Answer 2 · answered Jan 15 '18 at 16:54

I believe that you certainly looked into the Wikipedia: https://en.wikipedia.org/wiki/Spectrogram

Do not be confused with the spectrogram names, they are named either by what they are representing or how they at representing. There is a lot of reading that you have to do in order to fully understand spectrograms. Start with this: http://www.phon.ucl.ac.uk/courses/spsci/acoustics/week1-10.pdf

Linear or log denotes linear or logarithmic scaling. Some explanations are here: http://manual.audacityteam.org/man/spectrogram_view.html

Power spectrogram example is mentioned here. For this you have to understand power spectral density: https://www.mathworks.com/matlabcentral/answers/122472-how-to-get-the-power-spectral-density-from-a-spectrogram-in-a-given-frequency-range?s_tid=gn_loc_drop

Constant-Q is a time to frequency domain transformation as explained here: https://en.wikipedia.org/wiki/Constant-Q_transform It is different from FFT.

grayscale is just about using gray color to more easily observe the spectrogram

tempogram is the visual representation of the tempo in audio containing music signal. One example of the toolbox doing it is here: https://www.audiolabs-erlangen.de/resources/MIR/tempogramtoolbox/

Chroma is the technical term used in acoustics to represent the 'color' of the sound as explained here: 'http://acousticslab.org/psychoacoustics/PMFiles/Module05.htm#7b' "Pitch chroma: The distinctive quality of a specific tone, separating it from the rest of the tones within an octave. It describes perceptual 'differences'/'distances' of pitches within an octave and the perceptual sameness of pitches separated by one or more full octaves. It is reflected in the fact that the different note names (e.g. C, D, E, F, G, A, B, C, D ...) repeat periodically for every 2/1 increase in frequency (i.e. every octave) with the addition of a subscript (e.g. C4) to indicate how high or low this pitch is relative to some reference pitch. In other words, a numeric subscript difference between two notes that share the same pitch chroma (e.g. C4 vs. C5) reflects a pitch height difference of one or more octaves between two notes."

Hi Thoan, if you have found my comment useful please click the upper triangle: "The answer is useful". — VladP, Jan 18 '18 at 19:04
I'm really appreciate your answer and i would like to press upvote for you but my reputation is under 15 and the system doesn't allow me to upvote anything TTvTT — Toan Nhu, Jan 19 '18 at 08:01
No problem. What really matters is that it helped you. Cheers! — VladP, Jan 19 '18 at 15:01

Signal Processing: Can someone explain for me the different type of spectrogram?

2 Answers2