I was able to find answer for this.
For split(), default frame_length = 2048. I chose hop_length to be 100. All the random samples fall in first frame of computation (it splits the signal into frames when it has to compute RMS value). For the next hop, many zeros from silent samples are included which reduces the RMS value for the subsequent frames. Hence, 1st frame always has high energy. Appropriate samples can be obtained by
- Increasing top_db which includes more low energy frames (as
suggested by @jonnor)
- Increasing hop_length from 100 to 1000 so
that complete 1000 samples are included in one frame
- Reducing frame_length to have more resolution in computing RMS. I could get
first 1050 samples by below command
librosa.effects.split(wave, top_db=10, frame_length=100, hop_length=50)
We might have to play with these parameters to get the desired result