3

I have a conversation in wav file (customer service) I split it to 2 audio channels. Now I have 2 wav files and each person is speaking and it has silence periods. I need to cut out those silent periods to "compress" all one's persons words in shorter file.

I googled and found this link. It has this code:

def addFrameWithTransition(self, image_file, audio_file, transition_file):
    media_info = MediaInfo.parse(transition_file)
    duration_in_ms = media_info.tracks[0].duration
    audio_file = audio_file.replace("\\", "/")
    try:
        audio_clip = AudioSegment.from_wav(r"%s"%audio_file)
        f = sf.SoundFile(r"%s"%audio_file)
    except Exception as e:
        print(e)
        audio_clip = AudioSegment.from_wav("%s/pause.wav" % settings.assetPath)
        f = sf.SoundFile("%s/pause.wav" % settings.assetPath)
    duration = (len(f) / f.samplerate)
    audio_clip_with_pause = audio_clip
    self.imageframes.append(image_file)
    self.audiofiles.append(audio_clip_with_pause)
    self.durations.append(duration)
    self.transitions.append((transition_file, len(self.imageframes) - 1, duration_in_ms / 1000)) 

But it needs some kind of 'image file'. any other options?

ERJAN
  • 23,696
  • 23
  • 72
  • 146

1 Answers1

1

i found a small vad.py file that splits a conversation into two and actually compresses each voice track. In the end you will have 2 wav files with only 1 person speaking.

https://github.com/mauriciovander/silence-removal/blob/master/vad.py

works like this:

python vad name_of_new_file.wav
ERJAN
  • 23,696
  • 23
  • 72
  • 146
  • 3
    The code you shared is not, despite the name, voice activity detector (VAD). It's a crude "activity" detector that will be triggered by any noise, not just voice, that goes beyond threshold. – Lukasz Tracewski Apr 11 '20 at 06:36