I have a shell script that :
- Takes an mp3 file and converts it into wav (using ffmpeg)
- Splits it in several chunks when it encounters silence (using sox) and
- Sends each one of this chunks to the Google API to retrieve the corresponding text, which is appended in an output file.
Here is the code:
#!/bin/bash
TMPDIR=tmp
OUT=$TMPDIR/out
LANG=en-US
mkdir $TMPDIR
ffmpeg -i $1 in.wav
echo Audio has been extracted
#1 0.1 1% 1 1.5 1%
sox in.wav $OUT.flac rate 16k silence 1 0.1 1% 1 0.5 1% : newfile : restart
echo sox has splitted file
for i in $TMPDIR/*; do
echo -n `wget -q -U "rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=$LANG&client=Mozilla/5.0" --post-file $i --header="Content-Type: audio/x-flac; rate=16000"| sed 's/.*utterance":"//' | sed 's/","confidence.*//'`" " >> $1.txt
echo encoded $i chunk
done
This used to work perfectly last year, when I tried it. However now it does not work anymore (using the same mp3 input file as a test). Specifically, I believe that something changed in the syntaxis of the latest versions of the sox command.
I replaced the non working line
sox in.wav $OUT.flac rate 16k silence 1 0.1 1% 1 0.5 1% : newfile : restart
with
sox in.wav -r 16000 $OUT.flac silence 1 0.1 1% 1 0.5 1% : newfile : restart
However, I always have a single flac file generated in the TMPDIR, instead of many pieces.
Any hint on how this issue can be solved?