-2

I have been working on developing a small voice recognition program similar to Siri or Amazon Echo that would allow me to simplify several small tasks around my home. I am extremely new to bash, so I would like some assistance reducing the necessity for continuous data flow to the Google Speech To Text servers. Currently, I am recording a new audio file every three seconds and sending it to the Google servers to be translated. This method seems very inefficient. This portion of the code is shown below.

while :
do
        trap CTRLc INT
        echo "[speech-recog]: Recording"
        (arecord -D $hardware -q -f S16_LE -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; curl -s -X POST$
        sleep $sleepduration
        echo "[speech-recog]: Recording"
        (arecord -D $hardware -q -f S16_LE -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; curl -s -X POST$
        sleep $sleepduration
done

Instead, I hypothesized that making this script voice triggered would greatly reduce the amount of internet traffic on my network. By voice triggered, I mean it begins recording audio to send to Google upon hearing a sound of a specific volume or higher. It would be extremely helpful if anyone could provide any suggestions as to how I should go about creating this sound trigger or simply reducing the amount of requests to these servers in general.

Furthermore, the current method results in some audio being split into two or more files because the recording may start at any time before the speaker begins. Triggering the recording upon hearing a sound would also fix this problem.

Any and all suggestions related to my code are welcome. If any further information is necessary, please request it in the comments and I will be happy to provide you with anything you need to know. If you have any problems with my question, please leave a comment so I know not to make that mistake in the future. The bash script is shown below.

Note: The objective of this script is to write the response from the Google Speech to Text servers to a file called "SpeechLog.txt"

speech-recog.sh

#!/bin/bash
hardware="plughw:1,0"
duration="3"
sleepduration="3.05"
lang="en"
hw_bool=0
dur_bool=0
lang_bool=0
CTRLc() {
        echo "[speech-recog]: Terminating Faide master script. Are you sure (yes/no)?"
        read ShouldQuit
        if [ ${ShouldQuit^^} = "YES" ]
        then
                echo "[speech-recog]: Confirmation accepted, terminating script"
                sudo python3 Cleanup.py
                kill $$
        else
                echo "[speech-recog]: Denial accepted. Exiting confirmation request"
                clear
                echo "[speech-recog]: Listening..."
        fi
}
for var in "$@"
do
    if [ "$var" == "-D" ] ; then
        hw_bool=1
    elif [ "$var" == "-d" ] ; then
        dur_bool=1
    elif [ "$var" == "-l" ] ; then
        lang_bool=1
    elif [ $hw_bool == 1 ] ; then
        hw_bool=0
        hardware="$var"
    elif [ $dur_bool == 1 ] ; then
        dur_bool=0
        duration="$var"
    elif [ $lang_bool == 1 ] ; then
        lang_bool=0
        lang="$var"
    else
        echo "[speech-recog]: Invalid option, valid options are -D for hardware and -d for duration"
    fi
done
CheckFile() {
        LineCount=`cat SpeechLog.txt | wc -l`
        if [ $LineCount -gt 1 ]
        then
                sudo rm /dev/shm/out.flac
                sudo python3 VoiceMain.py
        fi
}
clear
echo "[speech-recog]: Speech recognition initialized"
echo "[speech-recog]: Listening..."
while :
do
        trap CTRLc INT
        echo "[speech-recog]: Recording"
        (arecord -D $hardware -q -f S16_LE -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; curl -s -X POST$
        sleep $sleepduration
        echo "[speech-recog]: Recording"
        (arecord -D $hardware -q -f S16_LE -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; curl -s -X POST$
        sleep $sleepduration
done
  • Incidentals: I assume the first line should be `#!/bin//bash` (missing `#`), otherwise it won't be recognized as a shebang line. Don't name your executable script `*.sh`: it mistakenly suggests a POSIX-compliant script and, generally, there's no need to use a suffix at all - let the shebang line alone determine how to execute the script (which also leaves you free to implement the script in a different language later). You're mixing Bash syntax (`==`) with POSIX syntax (`[ ... ]`) - unless you need to remain POSIX-compliant, using `[[ ... ]]` will make you happier. – mklement0 Jun 29 '16 at 04:11
  • More fundamentally: your question is "noisy", broad, and lacks focus. You stand a better chance of getting help if you ask a terse, specific, focused question. – mklement0 Jun 29 '16 at 04:13
  • 2
    Do the Python scripts really, really need to run with `sudo`? This looks like a security problem. – tripleee Jun 29 '16 at 04:14
  • To answer mklement0, Thanks for the help. I will look into fixing that problem. Furthermore, the # must have gotten lost in my copy and paste. Sorry about that. Also, I will focus my question more. Thank you for the suggestion. – supermitchell2 Jun 29 '16 at 04:36
  • Triplee, I am running them with sudo simply because I do not want to have any possible issues with them lacking privileges. Is there any way you could explain these possible security risks to me? It would be great to understand the risks in the future. I will attempt removing sudo from the python scripts in this script. – supermitchell2 Jun 29 '16 at 04:39
  • It's a very basic security principle; run with the least amount of privileges you need. https://en.wikipedia.org/wiki/Principle_of_least_privilege If there is a permission problem, fix it instead. – tripleee Jun 29 '16 at 04:50
  • Oh, I did not know that. Ill fix the sudo. Thanks! – supermitchell2 Jun 29 '16 at 04:53
  • It is better to use python for everything. – Nikolay Shmyrev Jun 29 '16 at 21:11
  • Nikolay, I am not sufficient enough with python to do this. If you are able to assist me with this or point me in the right direction, that would be excellent. – supermitchell2 Jun 29 '16 at 21:13

1 Answers1

1

this is a broad question, so i will only propose a strategy without implementing it.

first, you need to record continuously to avoid missing any audio. you could accomplish this with

nohup arecord --max-file-time 1 out.wav &

this should record continuously, creating many 1 second wav files named like out-01.wav, out-02.wav, etc... (i wonder what happens after out-99.wav?) 1 second seems to be the smallest possible. nohup ... & causes it to run forever in the background.

next, you need a script to continuously check, in order, for any new complete wav files. for example, each time the next wav file exists, the current one must be done, so process the current one.

install sox and use

sox out-01.wav -n stats 2>&1 | grep 'RMS lev dB\|RMS Pk dB' | awk '{print $4}'

to get the average and peak volume of the current wav. if peak < -15 dB and lev < -15 dB, there's probably no speech, so delete the wav and move to the next. (test with your mic setup to choose specific thresholds for peak and lev.)

if the volume is above threshold, then don't delete this wav. instead, rename it to maybespeech.wav, then move on to the next one.

if you find two above-threshold wavs in a row (i.e., you find an above-threshold wav when maybespeech.wav already exists), use sox to merge them into a new wav and replace maybespeech.wav with the merged wav. then move to the next one.

if you find a below-threshold wav when maybespeech.wav exists, then you're ready to do some speech recognition. rename it to maybespeech.done.wav, flac it, delete it, and curl the flac to google speech api. maybe name the flac uniquely and do the curl in the background so that this doesn't block processing of the next wav.

best of luck!

webb
  • 4,180
  • 1
  • 17
  • 26
  • This is exactly what I am looking for. The sox command worked excellent. However I do not understand the output I am receiving on the arecord command. It is providing me with a file named "nohup.out". From my research, this is expected. However, I am not receiving any output wav files. Inside the "nohup.out" file it says "arecord: main:722: audio open error: No such file or directory". Is there something I am doing wrong? – supermitchell2 Jun 29 '16 at 21:00