How to fix a shell script that divides mp3 files into chunks and sends them to Google speech to text API to retrieve their content in textual form?

Question

I have a shell script that :

Takes an mp3 file and converts it into wav (using ffmpeg)
Splits it in several chunks when it encounters silence (using sox) and
Sends each one of this chunks to the Google API to retrieve the corresponding text, which is appended in an output file.

Here is the code:

#!/bin/bash
TMPDIR=tmp
OUT=$TMPDIR/out
LANG=en-US

mkdir $TMPDIR
ffmpeg -i $1 in.wav 
echo Audio has been extracted

#1 0.1 1% 1 1.5 1%
sox in.wav $OUT.flac rate 16k silence 1 0.1 1% 1 0.5 1% : newfile : restart 
echo sox has splitted file

for i in $TMPDIR/*; do 
echo -n `wget -q -U "rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=$LANG&client=Mozilla/5.0" --post-file $i --header="Content-Type: audio/x-flac; rate=16000"| sed 's/.*utterance":"//' | sed 's/","confidence.*//'`" " >> $1.txt 
echo encoded $i chunk
done

This used to work perfectly last year, when I tried it. However now it does not work anymore (using the same mp3 input file as a test). Specifically, I believe that something changed in the syntaxis of the latest versions of the sox command.

I replaced the non working line

sox in.wav $OUT.flac rate 16k silence 1 0.1 1% 1 0.5 1% : newfile : restart

with

sox in.wav -r 16000 $OUT.flac silence 1 0.1 1% 1 0.5 1% : newfile : restart

However, I always have a single flac file generated in the TMPDIR, instead of many pieces.

Any hint on how this issue can be solved?

so the problem is with `sox` and not the google API? If so, then something about `sox` changed. Did you upgrade your OS since it used to work? Are you running this on the cloud, and just using a default environment? Etc, etc. Good luck. — shellter, Mar 07 '14 at 14:47
Yes, I upgraded the OS and I'm sure something changed with the syntax of sox, however after trying many times, I was not able to fix the issue. — Albz, Mar 07 '14 at 15:30
has the description of features in `man sox` changed in areas that would affect the arguments you are using?Maybe `silence` is now replaced with `-s`? There are also a specialized website for `sox`. The address might be in the `sox` README.TXT file, else googling should bring it up. Good luck. — shellter, Mar 07 '14 at 16:19

score 0 · Answer 1 · answered Apr 08 '14 at 15:03

0

Have you experimented with adjusting the threshold values for silence detection? Perhaps their update has made that calculation more sensitive.

answered Apr 08 '14 at 15:03

JoshOfAllTrades

523
3
13

I did but it didn't work. I used the same values as now some time ago and obtained good results, now I retrieve an empty file in JSON format. – Albz Apr 08 '14 at 16:46

score 0 · Accepted Answer · answered Aug 24 '14 at 16:56

It turns out I was using an outdated version of SoX which came bundled with the old OS installed in the server I was using. That version (v.14.0.0) did not support

: newfile : restart

Which allows to trim files recursively. This was one problem and was solved by simply updating SoX to the latest version.

A second problem in the script I posted above is the Google speech API, since now v1 has been removed and replaced with v2 which requires a personal developer key and has a limited number of daily queries (50). To know more about Google speech API v2 you can give a look here: https://github.com/gillesdemey/google-speech-v2

How to fix a shell script that divides mp3 files into chunks and sends them to Google speech to text API to retrieve their content in textual form?

2 Answers2