2

I am new to both python and librosa. I am trying to follow this method for a speech recognizer: acoustic front end

My code:

import librosa
import librosa.display
import numpy as np

y, sr = librosa.load('test.wav', sr = None)
normalizedy = librosa.util.normalize(y)

stft = librosa.core.stft(normalizedy, n_fft = 256, hop_length=16)
mel = librosa.feature.melspectrogram(S=stft, n_mels=32)
melnormalized = librosa.util.normalize(mel)
mellog = np.log(melnormalized) - np.log(10**-5)

The problem is that when I apply librosa.util.normalize to variable mel, I expect values to be between 1 and -1, which they aren't. What am I missing?

sabri
  • 23
  • 1
  • 8
  • What are the max and min of your values then? Are you sure that you are checking `melnormalized` and not `mellog` (which will have a different scale since log was applied) – Jon Nordby Mar 17 '19 at 00:53

1 Answers1

7

If you want your output to be log-scaled and normalized to between -1 and +1, you should log-scale first, then normalize:

import librosa
import librosa.display
import numpy as np

y, sr = librosa.load('test.wav', sr = None)
normalizedy = librosa.util.normalize(y)

stft = librosa.core.stft(normalizedy, n_fft = 256, hop_length=16)
mel = librosa.feature.melspectrogram(S=stft, n_mels=32)
mellog = np.log(mel + 1e-9)
melnormalized = librosa.util.normalize(mellog)
# use melnormalized
Jon Nordby
  • 5,494
  • 1
  • 21
  • 50
  • [Ikteaja Hasan](https://stackoverflow.com/users/14101949/ikteaja-hasan) posted an [Answer](https://stackoverflow.com/a/64947522/12695027) asking "The above answer is mentioned that the normalized range between -1 and -1. Is it correct or the range should be -1 and 1?" – Scratte Nov 21 '20 at 20:16
  • 1
    It should be - 1 and 1 yes, corrected now. Thanks Ikteaja and Scratte! – Jon Nordby Nov 22 '20 at 00:05