How to split a audio file based on silence and overlap the last say 2 seconds in python

Question

Currently I am using this code to cut the audio file into small chunks:

sound = AudioSegment.from_mp3("1.WAV")
f=open("decoded.txt", "a+")
chunks = split_on_silence(sound,min_silence_len=280,silence_thresh=-33,keep_silence=150)

for i, chunk in enumerate(chunks):
    print(i)
    print("\n")
    chunk.export(folder+"/chunk{0}.wav".format(i), format="wav")
    AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), folder+"/chunk{0}.wav".format(i))
    r = sr.Recognizer()
    with sr.AudioFile(AUDIO_FILE) as source:
        print("Listening...")
        audio = r.record(source)  # read the entire audio file
        f.write((r.recognize_google(audio) +" "))

f.close();

This creates chunks of files split according to silence... But what i want is that whenever an audio is split, the next slice starts from 2 seconds back so that any word which might be cut may come. Something like if silences are at time 10,13,18,22 then my slices should be 0-10,8-13,11-18,16-22. I am using pydub for splitting according to silence. Can i change something in pydub or is there some other package which does this work?

score 2 · Accepted Answer · answered Oct 24 '18 at 19:00

Since each chunk is split on silence, it will not have data for previous 2 seconds.
However, What you can do is , make a copy of last 2 seconds of previous chunks (n-1) and merge with next chunk (nth), skipping first chunk.

Pseudocode as below,

n1 + n2 + n3 + ...n #audio chunks that are split on silence
n1 + (<last 2 seconds of n1> + n2) + (<last 2 seconds of n2> + n3) + ...

You can also play with keep_silence to see what value makes sense for your requirements.

Other idea is to use pydub.silence.detect_nonsilent() to find the ranges of silence and make your own decisions about where to slice the original audio.

I'll leave that as coding exercise for you.

I did my own research for a couple of days came to the exact same conclusion which you said. I just took some interval of 5 seconds and before slicing the audio, I set the beginning time to "-2" of what it is. This seemed to work like a charm. Happy to know I was on the right track :) Thanks a lot — Abhijeet Sridhar, Oct 26 '18 at 04:05

How to split a audio file based on silence and overlap the last say 2 seconds in python

1 Answers1