Can someone help me understand the np.abs conversion for STFT in librosa?

Question

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> D = np.abs(librosa.stft(y))
>>> D
array([[2.58028018e-03, 4.32422794e-02, 6.61255598e-01, ...,
        6.82710262e-04, 2.51654536e-04, 7.23036574e-05],
       [2.49403086e-03, 5.15930466e-02, 6.00107312e-01, ...,
        3.48026224e-04, 2.35853557e-04, 7.54836728e-05],
       [7.82410789e-04, 1.05394892e-01, 4.37517226e-01, ...,
        6.29352580e-04, 3.38571583e-04, 8.38094638e-05],
       ...,
       [9.48568513e-08, 4.74725084e-07, 1.50052492e-05, ...,
        1.85637656e-08, 2.89708542e-08, 5.74304337e-09],
       [1.25165826e-07, 8.58259284e-07, 1.11157215e-05, ...,
        3.49099771e-08, 3.11740926e-08, 5.29926236e-09],
       [1.70630571e-07, 8.92518756e-07, 1.23656537e-05, ...,
        5.33256745e-08, 3.33264900e-08, 5.13272980e-09]], dtype=float32)

Why is there a np.abs function call in the 2nd line, why are negatives calculated then?

score 3 · Accepted Answer · answered May 05 '20 at 10:09

As you can see when running just

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> complex = librosa.stft(y)

[[ 2.46926467e-03+0.0000000e+00j  4.31839712e-02+0.0000000e+00j
   6.61340177e-01+0.0000000e+00j ... -1.06654959e-04+0.0000000e+00j
  -2.90835378e-05+0.0000000e+00j  3.53358846e-05+0.0000000e+00j]
 [ 2.56137503e-03+1.1307890e-19j  5.14071472e-02+5.1062172e-03j
   3.12469959e-01+5.1239032e-01j ... -6.26369513e-07-1.7899552e-05j
   6.21115832e-05+8.9027701e-05j -6.63267638e-05-2.4181936e-05j]
 [ 8.76825710e-04+1.9178635e-20j  9.54191685e-02+4.4643223e-02j
  -9.85670462e-02+4.2620054e-01j ...  1.46014354e-04+8.8074237e-05j
  -1.11950474e-04-1.7414341e-04j  1.29663958e-05+1.1292481e-04j]
 ...
 [ 1.42249689e-07+2.8255210e-20j  6.34592482e-07+1.9654651e-07j
   3.47742980e-06+1.4340003e-05j ...  2.72165117e-08-5.3495475e-09j
   5.09760589e-09+2.3726502e-08j -9.91400628e-10-2.6668809e-09j]
 [-4.12092085e-08+1.3764285e-19j  1.98188317e-07+8.5012516e-07j
  -5.88514422e-06+9.2995169e-06j ...  3.27279501e-08-2.5336826e-08j
   1.27822437e-08-1.9952591e-08j -2.34001551e-09-1.6291880e-09j]
 [-1.97310911e-07+0.0000000e+00j -9.55397468e-07+0.0000000e+00j
  -1.24679464e-05+0.0000000e+00j ... -7.20001267e-08+0.0000000e+00j
  -2.61475943e-08+0.0000000e+00j -2.84717561e-09+0.0000000e+00j]]

librosa.stft(y) returns an array of complex numbers, as one would expect from a Discrete Fourier Transform (DFT). These complex numbers give us phase and amplitude of the audio signal. But oftentimes we don't care about the phase (humans can't really perceive it very well anyway) and want to reduce the signal to just the amplitude, and that's just the absolute value of the complex numbers.

It's easy to understand, once you imagine each one of these complex numbers on the complex plane (image from here):

What you you are interested in is the length of the vector between (0, 0j) (the origin) and your number, e.g., z=(1, 2j). To get that length, you need to compute r = sqrt(1*1 + 2*2) (Pythagorean theorem)—and that's exactly what np.abs() does for complex numbers.

This is also nicely explained on Wikipedia.

why are negatives calculated then?

There are no negative numbers. I assume you mistake 2.58028018e-03 to be negative, when it's really just short for 2.58028018 * 10^-3, i.e., a very small number, in scientific notation.

Ah! I missed the complex number part, I read about Phasors today and it makes sense now, I confused the phasors components and the complex outputs, also the np.abs made me think there were negative elements in the array! Thank you for this @Hendrik! — Akash Sonthalia, May 05 '20 at 10:55
Please consider accepting my answer, if it indeed answered your question. Thanks! — Hendrik, May 05 '20 at 17:00

Can someone help me understand the np.abs conversion for STFT in librosa?

1 Answers1