0

Trying to train a neural network to deal with audio data, I would like to assess some of its inner representations. One of them is very much a magnitude spectrogram without phase information, but with high overlap between Hann windows.

Is there a way I can use tf.contrib.signal.inverse_stft to generate an audio signal from this magnitude-only spectrogram? If there is not, is there some other straightforward way (eg. something effecting to a sum of band pass filters on white noise) to do this?

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
Anaphory
  • 6,045
  • 4
  • 37
  • 68

1 Answers1

1

I don't know much about tf's inverse_stft; it seems to require a complimentary window function in order to work.

But to estimate the original waveform from its STFT without phase information, you might want to look at either the Griffin-Lim algorithm, or WaveNet vocoder conditioned on Mel spectrogram (which can be derived from linear spectrogram from STFT).

Griffin-Lim alg: https://github.com/bkvogel/griffin_lim

WaveNet vocoder: https://github.com/r9y9/wavenet_vocoder

Edy
  • 462
  • 3
  • 9