Librosa's inverse mel spectrogram to stft taking a long time

Question

I am currently trying to convert a mel spectrogram back into an audio file, however, librosa's mel_to_stft function is taking a long time (upwards to 15 minutes) to read in a 30 second .wav file sampled at 384kHz.

The following is my code:

# Code for high pass filter
def butter_highpass(cutoff, fs, order=5):
    nyq = 0.5 * fs
    normal_cutoff = cutoff / nyq
    b, a = butter(order, normal_cutoff, btype='high', analog=False)
    return b, a

def butter_highpass_filter(data, cutoff, fs, order=5):
    b, a = butter_highpass(cutoff, fs, order=order)
    y = filtfilt(b, a, data)
    return y

def high_pass_filter(data, sr): 
    # set as a highpass filter for 500 Hz
    filtered_signal = butter_highpass_filter(data, 500, sr, order=5)
    return filtered_signal

example_dir = '/Test/test.wav'
sr, data = wavfile.read(example_dir)
des_sr = 44100
data_resamp = samplerate.resample(data, des_sr/sr, 'sinc_best')
data_hp = high_pass_filter(data_resamp, des_sr)
mel_spect = librosa.feature.melspectrogram(y=data_resamp, sr=des_sr)
S = librosa.feature.inverse.mel_to_stft(mel_spect)
y = librosa.griffinlim(S)

Are you sure its mel_to_stft that is taking along time, and not the griffinlim call? — Jon Nordby, Aug 08 '20 at 14:36
What do you aim to achieve by converting to mel-spectrogram and then back to waveform? In the example given I do not see any progressing in the (mel)spectral domain — Jon Nordby, Aug 08 '20 at 14:37
@jonnor I can confirm that it's the inverse operation itself that takes a long time: More precisely, it's the call to `librosa.util._nnls` — Josh Greifer, Aug 25 '20 at 15:28

score 0 · Answer 1 · answered Aug 08 '20 at 14:45

Griffin-Lim is an iterative method to estimate the phase information needed when going from a magnitude-only spectrogram. The number of iterations in the librosa implementation can be adjusted (n_iter). Reducing this will speed-up things a bit, but it is in general slow.

Going back to a waveform after spectral processing can be sped up by:

Using one-shot approximate methods, like a neural network. For example Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks
By using the original phase information instead of estimating it from the modified magnitude spectrogram. This requires that the phase spectrogram is available (not just the magnitude), but that is often the case when doing spectral processing on audio files.

Librosa's inverse mel spectrogram to stft taking a long time

1 Answers1