firstly, this function is to remove silence of an audio. here is the official description:
https://librosa.github.io/librosa/generated/librosa.effects.split.html
librosa.effects.split(y, top_db=10, *kargs)
Split an audio signal into non-silent intervals.
top_db:number > 0 The threshold (in decibels) below reference to consider as silence
return: intervals:np.ndarray, shape=(m, 2) intervals[i] == (start_i, end_i) are the start and end time (in samples) of non-silent interval i.
so this is quite straightforward, for any sound which is lower than 10dB, treat it as silence and remove from the audio. It will return me a list of intervals which are non-silent segments in the audio.
So I did a very simple example and the result confuses me: the audio i load here is a 3 second humand talking, very normal takling.
y, sr = librosa.load(file_list[0]) #load the data
print(y.shape) -> (87495,)
intervals = librosa.effects.split(y, top_db=100)
intervals -> array([[0, 87495]])
#if i change 100 to 10
intervals = librosa.effects.split(y, top_db=10)
intervals -> array([[19456, 23040],
[27136, 31232],
[55296, 58880],
[64512, 67072]])
how is this possible...
I tell librosa, ok, for any sound which is below 100dB, treat it as silence. under this setting, the whole audio should be treated as silence, and based on the document, it should give me array[[0,0]] something...because after remove silence, there is nothing left...
But it seems librosa returns me the silence part instead of the non-silence part.