I love pydub. It is simple to understand. But when it comes to detecting non-silent chunks, librosa seems much faster. So I want to try using librosa in a project to speed my code up.
So far, I have been using pydub like this (segment
is an AudioSegment
):
thresh = segment.dBFS - (segment.max_dBFS - segment.dBFS)
non_silent_ranges = pydub.silence.detect_nonsilent(segment, min_silence_len=1000, silence_thresh=thresh)
The thresh
formula works mostly well, and when it does not, moving it a 5 or so dbs up or down does the trick.
Using librosa, I am trying this (y
is a numpy array loaded with librosa.load()
, with an sr
of 22050)
non_silent_ranges = librosa.effects.split(y, frame_length=sr, top_db=mistery)
To get similar results to pydub I tried setting mistery
to the following:
mistery = y.mean() - (y.max() - y.mean())
and the same after converting y to dbs:
ydbs = librosa.amplitude_to_db(y)
mistery = ydbs.mean() - (ydbs.max() - ydbs.mean())
In both cases, the results are very different from what get from pydub.
I have no background in audio processing and although I read about rms, dbFS, etc, I just don't get it--I guess I am getting old:)
Could somebody point me in the right direction? What would be the equivalent of my pydub solution in librosa? Or at least, explain to me how to get the max_dBFS
and dBFS
values of pydub in librosa (I am aware of how to convert and AudioSegment
to the equivalent librosa numpy array thanks to the excellent answer here)?