How do I call a librosa function on the entire audio file?

Question

I have short audio files which I'm trying to analyze using Librosa, in particular the spectral centroid function. However, this function outputs an array of different values representing the spectral centroid at different frames within the audio file. The documentation says that the frame size can be changed by specifying the parameter n_fft when calling the function. It would be more beneficial to me if this function analyzed the entire audio file at once rather than outputting the result at multiple points in time. Is there a way for me to specify that I want the function to be called with, say, a frame size of the entire audio file instead of the default time which is 2048 samples? Is there another better way?

Cheers and thank you!

score 1 · Accepted Answer · answered Dec 31 '20 at 10:41

The length of the FFT window (n_fft) specifies not only how many samples you need, but also the frequency resolution of the result (longer n_fft, better resolution). To ensure comparable results for many files you probably want to use the same n_fft value for all of them.

With that out of the way, say your files all have no more than 16k samples. Then you may still achieve a reasonable runtime (FFT runs in O(N log N)). Obviously, this will get worse as your file size increases. So you could call spectral_centroid(y=y, n_fft=16384, hop_length=16384, center=False) and because hop_length is set to the same value as n_fft you would compute the FFT for non-overlapping windows. And because n_fft is greater than the max number of samples in all your files (in this example), you should get only one value. Note that I set center to False to avoid an adjustment that is not necessary for your scenario.

Alternatively to choosing a long transform window, you could also compute many values for overlapping windows (or frames) using the STFT (which is what librosa does anyway) and simply average the resulting values like this:

import numpy as np
import librosa

y, sr = librosa.load(librosa.ex('trumpet'))
cent = librosa.feature.spectral_centroid(y=y, sr=sr, center=False)
avg_cent = np.mean(cent)
print(avg_cent)

2618.004809523263

The latter solution is in line with what is usually done in MIR and my recommendation. Note that this also allows you to use other statistics functions like the median, which may or may not be something you are interested in. In other words, you can determine the distribution of the centroids, which arguably carries more meaning.

How do I call a librosa function on the entire audio file?

1 Answers1