2

I am trying to do a project, and in part of the project I have the user say a word which gets recorded. This word then gets the silence around it cut out, and there is a button that plays back their word without the silence. I am using librosa's librosa.effects.trim command to achieve this.

For example:

def record_audio():
    global myrecording
    global yt
    playsound(beep1)
    myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=1)
    sd.wait() 
    playsound(beep2)

    #trimming the audio
    yt, index = librosa.effects.trim(myrecording, top_db=60)

However, when I play the audio back, I can tell that it is not trimming the recording. The variable explorer shows that myrecording and yt are the same length. I can hear it when I play what is supposed to be the trimmed audio clip back as well. I don't get any error messages when this occurs either. Is there any way to get librosa to actually clip the audio? I have tried adjusting top_db and that did not fix it. Aside from that, I am not quite sure what I could be doing wrong.

EMC
  • 91
  • 1
  • 7

1 Answers1

5

For a real answer, you'd have to post a sample recording so that we could inspect what exactly is going on.

In lieu of of that, I'd like to refer to this GitHub issue, where one of the main authors of librosa offers advice for a very similar issue.

In essence: You want to lower the top_db threshold and reduce frame_length and hop_length. E.g.:

yt, index = librosa.effects.trim(myrecording, top_db=50, frame_length=256, hop_length=64)

Decreasing hop_length effectively increases the resolution for trimming. Decreasing top_db makes the function less sensitive, i.e., low level noise is also regarded as silence. Using a computer microphone, you do probably have quite a bit of low level background noise.

If this all does not help, you might want to consider using SOX, or its Python wrapper pysox. It also has a trim function.

Update Look at the waveform of your audio. Does it have a spike somewhere at the beginning? Some crack sound perhaps. That will keep librosa from trimming correctly. Perhaps manually throwing away the first second (=fs samples) and then trimming solves the issue:

librosa.effects.trim(myrecording[fs:], top_db=50, frame_length=256, hop_length=64)
Hendrik
  • 5,085
  • 24
  • 56
  • This solution is closer to what I am looking for, but still does not quite solve it. The first one did not trim. The second one gave better results, but that was mainly because I had learned where the one second mark was a knew when to speak then, which the user would not know. I will try pysox and see how that works – EMC Jun 24 '20 at 23:04
  • Now that I think about it, it may be that the fans on my computer are too loud and that could be interfering with the trimming – EMC Jun 24 '20 at 23:06
  • There may also be a spike at the end preventing trailing silence removal. This is usually a percussive peak caused by the microphone detecting a button press to stop recording. This depends on what the recording device is, but it is something I have encountered. – Alice Aug 18 '23 at 17:56