Why librosa splits provides incorrect non-silent samples

Question

I used code from here and modified hop_length to 100. Why do I get incorrect outputs?

Specifically change is

print(wave.shape)
non_silent_interval = librosa.effects.split(wave, top_db=0.1, hop_length=100) 
print(non_silent_interval)

I get output

(2000,)
[[  0 100]]

But signal contains 1000 nonzero samples. why it suggests that non silent samples are between 0 to 100 only?

(2000,) It is concatenation of 1000 non-silent samples and 1000 silent samples — Vinay, May 25 '20 at 06:35

score 0 · Answer 1 · answered May 25 '20 at 06:48

librosa.effects.split will by default use a reference point (0 dB) that is the maximum of your signal. top_db adjusts the silence threshold. If values are below -top_db, they are considered silent. So with top_db=0.1, only frames between -0.1 db to 0.0 db are considered non-silent, which is incredibly unlikely. That this happens for the first frame is probably just luck.

Use a larger value for the silence threshold top_db, like 24. Or whatever fits the dynamic range of your recordings well.

score 0 · Accepted Answer · answered May 25 '20 at 11:44

I was able to find answer for this.

For split(), default frame_length = 2048. I chose hop_length to be 100. All the random samples fall in first frame of computation (it splits the signal into frames when it has to compute RMS value). For the next hop, many zeros from silent samples are included which reduces the RMS value for the subsequent frames. Hence, 1st frame always has high energy. Appropriate samples can be obtained by

Increasing top_db which includes more low energy frames (as suggested by @jonnor)
Increasing hop_length from 100 to 1000 so that complete 1000 samples are included in one frame
Reducing frame_length to have more resolution in computing RMS. I could get first 1050 samples by below command

librosa.effects.split(wave, top_db=10, frame_length=100, hop_length=50)

We might have to play with these parameters to get the desired result

Why librosa splits provides incorrect non-silent samples

2 Answers2