3

For example I have a wav file with speech.

I can create nice spectrogram visualization with sox:

wget https://google.github.io/tacotron/publications/tacotron2/demos/romance_gt.wav
sox romance_gt.wav -n spectrogram -o spectrogram.png

enter image description here

How can I reproduce this spectrogram in python?

Here is example using scipy.signal.spectrogram

input_file = 'temp/romance_gt.wav'
fs, x = wavfile.read(input_file)
print('fs', fs)
print('x.shape', x.shape)

f, t, Sxx = signal.spectrogram(x, fs)
print('f.shape', f.shape)
print('t.shape', t.shape)
print('Sxx.shape', Sxx.shape)
plt.pcolormesh(t, f, Sxx)
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.savefig('spectrogram_scipy.png')

But looks like some parameters are bad or something is broken: enter image description here

mrgloom
  • 20,061
  • 36
  • 171
  • 301

1 Answers1

5

Notice the scale of the color bar in the plot generated by sox. The units are dBFS: decibels relative to full scale. To reproduce the plot with SciPy and Matplotlib, you'll need to scale the values so that the maximum is 1, and then take a logarithm of the values to convert to dB.

Here's a modified version of your script that includes an assortment of tweaks to the arguments of spectrogram and pcolormesh that creates a plot similar to the sox output.

import numpy as np
from scipy.io import wavfile
from scipy import signal
import matplotlib.pyplot as plt

input_file = 'romance_gt.wav'
fs, x = wavfile.read(input_file)
print('fs', fs)
print('x.shape', x.shape)

nperseg = 1025
noverlap = nperseg - 1
f, t, Sxx = signal.spectrogram(x, fs,
                               nperseg=nperseg,
                               noverlap=noverlap,
                               window='hann')
print('f.shape', f.shape)
print('t.shape', t.shape)
print('Sxx.shape', Sxx.shape)
plt.pcolormesh(1000*t, f/1000, 10*np.log10(Sxx/Sxx.max()),
               vmin=-120, vmax=0, cmap='inferno')
plt.ylabel('Frequency [kHz]')
plt.xlabel('Time [ms]')
plt.colorbar()
plt.savefig('spectrogram_scipy.png')

I divided Sxx by Sxx.max() to account for the "full-scale" aspect of dBFS. I adjusted the nperseg and noverlap arguments of spectrogram to give resolutions higher than the defaults along both the frequency and time axes. I used window='hann' to match the default behavior of sox. (You can find details for the sox spectrogram at http://sox.sourceforge.net/sox.html.) I also used vmin=-120 and vmax=0 in pcolormesh, to match the default range used by the sox spectrogram.

Here's the plot:

spectogram

The "inferno" colormap isn't as intense as the one used in the sox plot. See the tutorial on "Choosing Colormaps in Matplotlib" for alternative colormaps.

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214