3

In comparing the outputs of Scipy’s STFT and Librosa’s STFT, I found that the respective time bins in the 2D output arrays were off by one. To clarify, Scipy’s Zxx resulted in an output of (513, 341), and Librosa’s stft gave me (513, 340). I printed off the time segments of each output, and found that the Scipy times began at 0 seconds, and the Librosa times began at the first hop. I may be missing something very basic here, but I can’t quite figure out why this discrepancy is happening. Thanks in advance for the help!

Audio file used: https://clyp.it/e3thsdpo

import numpy as np
from scipy import signal
import librosa
import librosa.display
import matplotlib.pyplot as plt

infile='fox_rain.wav'
print ('load wav', infile)
n_fft = 1024
hop_length = int(n_fft // 2)
data, samplerate = librosa.load(infile, sr=None, mono=True) #native samplerate
stft = librosa.stft(data, n_fft=n_fft, hop_length=hop_length)
stft_magnitude = np.abs(stft)
angle = np.angle(stft) #phase of the stft
b = np.exp(1.0j* angle) #phase info

f, t, Zxx = signal.stft(data, fs=samplerate, nperseg=1024)
frequency_bins = f
scipy_time_bins = t

librosa_time_bins = librosa.frames_to_time(range(0, stft.shape[1]), sr=samplerate, hop_length=(n_fft//2), n_fft=n_fft)


print(f"scipy_time_bins = {scipy_time_bins}")
print(f"librosa_time_bins = {librosa_time_bins}")

length = data.shape[0] / samplerate
time = np.linspace(0., length, data.shape[0])

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(13, 8))

ax[0].plot(time, data)
ax[0].set_xlabel("Time [s]")
ax[0].set_ylabel("Magnitude")

librosa.display.specshow(librosa.amplitude_to_db(stft_magnitude, ref=np.max), y_axis='log', x_axis='time', sr=samplerate)
plt.colorbar(format='%+2.0f dB')

fig.tight_layout()
plt.show()
SuperKogito
  • 2,998
  • 3
  • 16
  • 37

0 Answers0