How can I reverse a scipy.signal.spectrogram to audio with Python?

Question

I have:

import librosa
from scipy import signal 
import scipy.io.wavfile as sf    

samples, sample_rate = sf.read(args.file)
nperseg = int(sample_rate * 0.001 * 20)
frequencies, times, spectrogram = signal.spectrogram(samples, 
                                                     sample_rate, 
                                                     nperseg=nperseg, 
                                                     window=signal.hann(nperseg))

audio_signal = librosa.griffinlim(spectrogram)
print(audio_signal, audio_signal.shape)

sf.write('test.wav', audio_signal, sample_rate)

However, this produces a (near) empty sound file.

To use Griffin Lim, you need a magnitude spectrogram. I'd try to specify the mode in your `signal.spectrogram(... mode='magnitude')` call. Haven't tested. — Hendrik, Feb 24 '20 at 14:09
I can't comment about the librosa library. Assuming, that is not the problem, did you try the scipy.io library for read and write of the audio file? scipy.io.wavfile.read and scipy.io.wavfile.write. Note that the order changes from signal, signal_rate to signal_rate, signal. (https://docs.scipy.org/doc/scipy/reference/io.html#module-scipy.io.wavfile) — DrSpill, Feb 27 '20 at 11:17

SuperKogito · Accepted Answer · 2020-10-13T09:24:21.647

As @DrSpill mentioned, scipy.io.wav.read and scipy.io.wav.write orders were wrong and also the import from librosa was not correct. This should do it:

import librosa
import numpy as np
import scipy.signal
import scipy.io.wavfile

# read file
file    = "temp/processed_file.wav"
fs, sig = scipy.io.wavfile.read(file)
nperseg = int(fs * 0.001 * 20)

# process
frequencies, times, spectrogram = scipy.signal.spectrogram(sig, 
                                                           fs, 
                                                           nperseg=nperseg, 
                                                           window=scipy.signal.hann(nperseg))
audio_signal = librosa.core.spectrum.griffinlim(spectrogram)
print(audio_signal, audio_signal.shape)

# write output
scipy.io.wavfile.write('test.wav', fs, np.array(audio_signal, dtype=np.int16))

Remark: The resulting file had an accelerated tempo when I heard it, I think this is due to your processing but with some tweaking it should work.

A good alternative, would be to only use librosa, like this:

import librosa
import numpy as np

# read file
file    = "temp/processed_file.wav"
sig, fs = librosa.core.load(file, sr=8000)

# process
abs_spectrogram = np.abs(librosa.core.spectrum.stft(sig))
audio_signal = librosa.core.spectrum.griffinlim(abs_spectrogram)

print(audio_signal, audio_signal.shape)

# write output
librosa.output.write_wav('test2.wav', audio_signal, fs)

Does the reconstructed file have the same number of samples? — Shamoon, Feb 28 '20 at 18:22
Sorry for the late answer, I just further tested this and in the second solution, one should specify the input sampling rate (I edited the code accordingly). Using [ffmpeg](https://www.ffmpeg.org/) I verified that the input and output signals have the same sampling rates. However, you should be careful about the bitrates and the encoding when using the second solution. For more please refer to the [librosa-documentation](https://librosa.github.io/librosa/). — SuperKogito, Mar 01 '20 at 19:01

score 2 · Answer 2 · answered Jan 15 '21 at 17:22

librosa.output was removed. It is no longer providing its deprecated output module. Instead try soundfile.write:

import numpy as np
import soundfile as sf
sf.write('stereo_file.wav', np.random.randn(10, 2), 44100, 'PCM_24')

#Per your code you could try:
sf.write('test.wav', audio_signal, sample_rate, 'PCM_24')

How can I reverse a scipy.signal.spectrogram to audio with Python?

2 Answers2

Linked