0

I used code from here and modified hop_length to 100. Why do I get incorrect outputs?

Specifically change is

print(wave.shape)
non_silent_interval = librosa.effects.split(wave, top_db=0.1, hop_length=100) 
print(non_silent_interval)

I get output

(2000,)
[[  0 100]]

But signal contains 1000 nonzero samples. why it suggests that non silent samples are between 0 to 100 only?

Vinay
  • 33
  • 5

2 Answers2

0

librosa.effects.split will by default use a reference point (0 dB) that is the maximum of your signal. top_db adjusts the silence threshold. If values are below -top_db, they are considered silent. So with top_db=0.1, only frames between -0.1 db to 0.0 db are considered non-silent, which is incredibly unlikely. That this happens for the first frame is probably just luck.

Use a larger value for the silence threshold top_db, like 24. Or whatever fits the dynamic range of your recordings well.

Jon Nordby
  • 5,494
  • 1
  • 21
  • 50
0

I was able to find answer for this.

For split(), default frame_length = 2048. I chose hop_length to be 100. All the random samples fall in first frame of computation (it splits the signal into frames when it has to compute RMS value). For the next hop, many zeros from silent samples are included which reduces the RMS value for the subsequent frames. Hence, 1st frame always has high energy. Appropriate samples can be obtained by

  1. Increasing top_db which includes more low energy frames (as suggested by @jonnor)
  2. Increasing hop_length from 100 to 1000 so that complete 1000 samples are included in one frame
  3. Reducing frame_length to have more resolution in computing RMS. I could get first 1050 samples by below command

librosa.effects.split(wave, top_db=10, frame_length=100, hop_length=50)

We might have to play with these parameters to get the desired result

Vinay
  • 33
  • 5