1

I'm wondering if you can help advise if aubio (or any other similar services) are right for our business. Sadly I'm not a developer or sound engineer so please forgive my ignorance... but any feedback would be much appreciated!

Currently we take a an audio file, for example a 1hr recording of a conference, and chop it into shorter sections of audio. The problem with this is the brutal way in which the audio is dissected; if we chop a 60 min file into 5 minute sections, every 5 mins it is likely that a word or sentence will be chopped in half, resulting in a loss of quality, as its impossible for the listener to decipher the half word/sentence.

I can see that the aubio site lists one of its features as "segmenting a sound file before each of its attacks". I'm wondering if aubio or similar could be used to help us segment our audio files better? We would love to be able to slice/tag an audio file during a gap or pause in speech rather that mid word.

Any advice would be much appreciated.

Kind regards Tom

1 Answers1

2

The algorithm to detect silence is called "Voice Activity Detection", if you search in Google you can find many implementations from simple to advanced ones in many programming langauges. For example you can download sphinxbase library from http://cmusphinx.sourceforge.net and use embedded tool sphinx_cont_fileseg to segment file on chunks:

   sphinx_cont_fileseg -i file.wav -w

There are other implementations too. As far as I see aubio doesn't have VAD implementation inside though you can probably build it using aubio classes. Aubio seems to be more targetted on music analysis and less on speech and does not have VAD implementation included.

Once you detected silence you can cut on it, it's a trivial part to implement. It's worth to find a developer though.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87