Return value of librosa.effect.Split is strange

Question

As titled, the result of this function is not logical and I don't understand what the function is doing.

For example, here is some reproducible code:

#load sample audio
filename = librosa.util.example_audio_file()
audio, sr = librosa.load(filename)

#get intervals which are non-silent
inter_20 = librosa.effects.split(audio, top_db=20)
inter_5 = librosa.effects.split(audio, top_db=5)

#create audio
above_20 = np.zeros(audio.shape)
above_5 = np.zeros(audio.shape)

for i in inter_20:
    start,end = i
    above_20[start:end]=audio[start:end]

for j in inter_5:
    start,end = j
    above_5[start:end]=audio[start:end]

#plot them out:
plt.figure(figsize=[15,3]) #figure 1
plt.plot(audio)
plt.plot(above_5,color='red')
plt.title('Audio above 5 dB')

plt.figure(figsize=[15,3]) #figure 2
plt.plot(audio)
plt.plot(above_20,color='red')
plt.title('Audio above 20 dB')

you can see from here: for figure 1, which is audio above 5dB:

audio above 5db

for figure 2, which is audio above 20dB:

audio above 20db

How can it be that audio above 20dB is more than audio above 5dB? To me this doesn't make sense.

Are you trying to denoise some audio? As in, you have some audio where a person is speaking and there is some background noise that you want to remove? I don't think there is anything wrong with librosa's `split()` return value — Ahmad Moussa, Nov 20 '19 at 14:05
@Ahmad Moussa, what i want to do is to remove silence...i have some real data and it contains several silent segments, i want to use this function to remove them... — BarCodeReader, Nov 21 '19 at 05:03
@BarCodeReader, did you solve your problem? Cause I have a similar one. I have a speech of a person and want to remove all the long pauses between sentences. Struggling with librosa right now and eventually winded up out here — George Zorikov, Jun 18 '20 at 00:02
from below's answer, seems there is a default top_db value K in librosa, for example, K = 15. now when you set top_db = 5, this does not mean below 5db sounds, this means below K - 5 = 10db sounds will be treated as silence. I think this K value can be set in librosa. But TBH, the function here is a bit confusing and misleading. — BarCodeReader, Jun 19 '20 at 02:27
I agree...this top_db is quite confusing term. Basically anything below (max_db-top_db) values are removed by treating as "silence" as explained by Pieter21 below. — HopeKing, Sep 05 '20 at 19:23

Pieter21 · Accepted Answer · 2019-11-20T12:11:49.810

3

From the documentation at: https://librosa.github.io/librosa/generated/librosa.effects.split.html

top_db:number > 0

  The threshold (in decibels) **below** reference to consider as silence

I think top_db:20 means everything below (TOP - 20dB) instead of just 20dB is considered silence.

And there will be more above TOP - 20dB than TOP - 5dB. It also could explain your pictures.

edited Nov 20 '19 at 12:11

answered Nov 20 '19 at 09:46

Pieter21

1,765
1
10
22

top_db=20 means below 20 db is silence and librosa will remove it. so, librosa will remove more if we use 20 db instead of 5db, then it means for 20 db, we obtain less non-silent audio...but from my graph above, we get more when top_db=20 is set. – BarCodeReader Nov 20 '19 at 10:10
correct me if i understand the doc incorrectly...really thanks – BarCodeReader Nov 20 '19 at 10:20
1

@BarCodeReader, I think it means 20dB below the reference value. The reference value can be set or is the maximum value by default. If I understand the documentation correctly. And it would explain your observation. – Pieter21 Nov 20 '19 at 12:09
ok I see your point. I will test by using amplitude_to_db function or similar functions to check if it is really like that – BarCodeReader Nov 21 '19 at 05:10

Return value of librosa.effect.Split is strange

1 Answers1

Linked