0

I have a shell script that :

  1. Takes an mp3 file and converts it into wav (using ffmpeg)
  2. Splits it in several chunks when it encounters silence (using sox) and
  3. Sends each one of this chunks to the Google API to retrieve the corresponding text, which is appended in an output file.

Here is the code:

#!/bin/bash
TMPDIR=tmp
OUT=$TMPDIR/out
LANG=en-US

mkdir $TMPDIR
ffmpeg -i $1 in.wav 
echo Audio has been extracted

#1 0.1 1% 1 1.5 1%
sox in.wav $OUT.flac rate 16k silence 1 0.1 1% 1 0.5 1% : newfile : restart 
echo sox has splitted file

for i in $TMPDIR/*; do 
echo -n `wget -q -U "rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=$LANG&client=Mozilla/5.0" --post-file $i --header="Content-Type: audio/x-flac; rate=16000"| sed 's/.*utterance":"//' | sed 's/","confidence.*//'`" " >> $1.txt 
echo encoded $i chunk
done

This used to work perfectly last year, when I tried it. However now it does not work anymore (using the same mp3 input file as a test). Specifically, I believe that something changed in the syntaxis of the latest versions of the sox command.

I replaced the non working line

sox in.wav $OUT.flac rate 16k silence 1 0.1 1% 1 0.5 1% : newfile : restart 

with

sox in.wav -r 16000 $OUT.flac silence 1 0.1 1% 1 0.5 1% : newfile : restart 

However, I always have a single flac file generated in the TMPDIR, instead of many pieces.

Any hint on how this issue can be solved?

Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
Albz
  • 1,982
  • 2
  • 21
  • 33
  • so the problem is with `sox` and not the google API? If so, then something about `sox` changed. Did you upgrade your OS since it used to work? Are you running this on the cloud, and just using a default environment? Etc, etc. Good luck. – shellter Mar 07 '14 at 14:47
  • Yes, I upgraded the OS and I'm sure something changed with the syntax of sox, however after trying many times, I was not able to fix the issue. – Albz Mar 07 '14 at 15:30
  • has the description of features in `man sox` changed in areas that would affect the arguments you are using?Maybe `silence` is now replaced with `-s`? There are also a specialized website for `sox`. The address might be in the `sox` README.TXT file, else googling should bring it up. Good luck. – shellter Mar 07 '14 at 16:19

2 Answers2

0

Have you experimented with adjusting the threshold values for silence detection? Perhaps their update has made that calculation more sensitive.

JoshOfAllTrades
  • 523
  • 3
  • 13
  • I did but it didn't work. I used the same values as now some time ago and obtained good results, now I retrieve an empty file in JSON format. – Albz Apr 08 '14 at 16:46
0

It turns out I was using an outdated version of SoX which came bundled with the old OS installed in the server I was using. That version (v.14.0.0) did not support

: newfile : restart 

Which allows to trim files recursively. This was one problem and was solved by simply updating SoX to the latest version.

A second problem in the script I posted above is the Google speech API, since now v1 has been removed and replaced with v2 which requires a personal developer key and has a limited number of daily queries (50). To know more about Google speech API v2 you can give a look here: https://github.com/gillesdemey/google-speech-v2

Albz
  • 1,982
  • 2
  • 21
  • 33