normalizing mel spectrogram to unit peak amplitude?

Question

I am new to both python and librosa. I am trying to follow this method for a speech recognizer: acoustic front end

My code:

import librosa
import librosa.display
import numpy as np

y, sr = librosa.load('test.wav', sr = None)
normalizedy = librosa.util.normalize(y)

stft = librosa.core.stft(normalizedy, n_fft = 256, hop_length=16)
mel = librosa.feature.melspectrogram(S=stft, n_mels=32)
melnormalized = librosa.util.normalize(mel)
mellog = np.log(melnormalized) - np.log(10**-5)

The problem is that when I apply librosa.util.normalize to variable mel, I expect values to be between 1 and -1, which they aren't. What am I missing?

What are the max and min of your values then? Are you sure that you are checking `melnormalized` and not `mellog` (which will have a different scale since log was applied) — Jon Nordby, Mar 17 '19 at 00:53

Jon Nordby · Answer 1 · 2020-11-22T00:03:49.047

7

If you want your output to be log-scaled and normalized to between -1 and +1, you should log-scale first, then normalize:

import librosa
import librosa.display
import numpy as np

y, sr = librosa.load('test.wav', sr = None)
normalizedy = librosa.util.normalize(y)

stft = librosa.core.stft(normalizedy, n_fft = 256, hop_length=16)
mel = librosa.feature.melspectrogram(S=stft, n_mels=32)
mellog = np.log(mel + 1e-9)
melnormalized = librosa.util.normalize(mellog)
# use melnormalized

edited Nov 22 '20 at 00:03

answered Apr 12 '19 at 19:26

Jon Nordby

5,494
1
21
50

[Ikteaja Hasan](https://stackoverflow.com/users/14101949/ikteaja-hasan) posted an [Answer](https://stackoverflow.com/a/64947522/12695027) asking "The above answer is mentioned that the normalized range between -1 and -1. Is it correct or the range should be -1 and 1?" – Scratte Nov 21 '20 at 20:16
1

It should be - 1 and 1 yes, corrected now. Thanks Ikteaja and Scratte! – Jon Nordby Nov 22 '20 at 00:05

normalizing mel spectrogram to unit peak amplitude?

1 Answers1