3

To detect speech I'm playing with this sox command:

rec voice.wav silence 1 5 30% 1 0:00:02 30%

It should start recording whenever the input volume raises about the threshold of 30% and stops after 2 seconds the audio falls below the same threshold.

It works. But It would be much better if it could be "retriggerable". I mean: after the audio falls below the threshold and the audio rises again, it should continue the registration (i.e. the user is still speaking).

It should stops only when it detects silence for whole 2 seconds. Or do you recommend any other "VOX" tool?

Mark
  • 4,338
  • 7
  • 58
  • 120
  • The doc says: "For below-periods, duration specifies a period of silence that must exist before audio is not copied any more. By specifying a higher duration, silence that is wanted can be left in the audio. For example, if you have a song with an expected 1 second of silence in the middle and 2 seconds of silence at the end, a duration of 2 seconds could be used to skip over the middle silence." so it SHOULD work as I expect. – Mark May 03 '16 at 16:05

1 Answers1

7

I've spent a lot of time experimenting with SOX to do VOX and have gotten it to work reasonably well. I've been using Audacity to view the resultant wave form, and have settled on the following SOX command...

rec snd.wav silence 1 .5 2.85% 1 1.0 3.0% vad gain -n  : newfile : restart

This will:

  • wait until it hears activity above the threshold for a half second, then start recording (silence 1 .5 2.85%)
  • stop recording when audible activity falls to zero for one second (... 1 1.0 3.0%)
  • trim off any initial silence up to voice detection (vad)
  • normalize the gain (gain -n)
  • store the result into a new file (snd001.wav, snd002.wav)
  • restart the process

Getting the "silence" numbers correct involved a lot of trial and error, and will depend on ambient noise as well as the sensitivity of your microphone. I'm using the microphone in the Logitech QuickCam IM on a Raspberry Pi through USB.

On a side note, this whole thing complains with the following...

rec FAIL formats: can't open input  `default': snd_pcm_open error: No such file or directory

... until I created this variable in the environment:

export AUDIODEV=hw:1,0

Again - this involved a lot of experimentation with the values for "silence", and it WILL need some tweaking for your environment.

mnr
  • 434
  • 1
  • 8
  • 13