The length of the FFT window (n_fft
) specifies not only how many samples you need, but also the frequency resolution of the result (longer n_fft
, better resolution). To ensure comparable results for many files you probably want to use the same n_fft
value for all of them.
With that out of the way, say your files all have no more than 16k samples. Then you may still achieve a reasonable runtime (FFT runs in O(N log N)). Obviously, this will get worse as your file size increases. So you could call spectral_centroid(y=y, n_fft=16384, hop_length=16384, center=False)
and because hop_length
is set to the same value as n_fft
you would compute the FFT for non-overlapping windows. And because n_fft
is greater than the max number of samples in all your files (in this example), you should get only one value. Note that I set center
to False
to avoid an adjustment that is not necessary for your scenario.
Alternatively to choosing a long transform window, you could also compute many values for overlapping windows (or frames) using the STFT (which is what librosa does anyway) and simply average the resulting values like this:
import numpy as np
import librosa
y, sr = librosa.load(librosa.ex('trumpet'))
cent = librosa.feature.spectral_centroid(y=y, sr=sr, center=False)
avg_cent = np.mean(cent)
print(avg_cent)
2618.004809523263
The latter solution is in line with what is usually done in MIR and my recommendation. Note that this also allows you to use other statistics functions like the median, which may or may not be something you are interested in. In other words, you can determine the distribution of the centroids, which arguably carries more meaning.