I have been working on developing a small voice recognition program similar to Siri or Amazon Echo that would allow me to simplify several small tasks around my home. I am extremely new to bash, so I would like some assistance reducing the necessity for continuous data flow to the Google Speech To Text servers. Currently, I am recording a new audio file every three seconds and sending it to the Google servers to be translated. This method seems very inefficient. This portion of the code is shown below.
while :
do
trap CTRLc INT
echo "[speech-recog]: Recording"
(arecord -D $hardware -q -f S16_LE -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; curl -s -X POST$
sleep $sleepduration
echo "[speech-recog]: Recording"
(arecord -D $hardware -q -f S16_LE -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; curl -s -X POST$
sleep $sleepduration
done
Instead, I hypothesized that making this script voice triggered would greatly reduce the amount of internet traffic on my network. By voice triggered, I mean it begins recording audio to send to Google upon hearing a sound of a specific volume or higher. It would be extremely helpful if anyone could provide any suggestions as to how I should go about creating this sound trigger or simply reducing the amount of requests to these servers in general.
Furthermore, the current method results in some audio being split into two or more files because the recording may start at any time before the speaker begins. Triggering the recording upon hearing a sound would also fix this problem.
Any and all suggestions related to my code are welcome. If any further information is necessary, please request it in the comments and I will be happy to provide you with anything you need to know. If you have any problems with my question, please leave a comment so I know not to make that mistake in the future. The bash script is shown below.
Note: The objective of this script is to write the response from the Google Speech to Text servers to a file called "SpeechLog.txt"
speech-recog.sh
#!/bin/bash
hardware="plughw:1,0"
duration="3"
sleepduration="3.05"
lang="en"
hw_bool=0
dur_bool=0
lang_bool=0
CTRLc() {
echo "[speech-recog]: Terminating Faide master script. Are you sure (yes/no)?"
read ShouldQuit
if [ ${ShouldQuit^^} = "YES" ]
then
echo "[speech-recog]: Confirmation accepted, terminating script"
sudo python3 Cleanup.py
kill $$
else
echo "[speech-recog]: Denial accepted. Exiting confirmation request"
clear
echo "[speech-recog]: Listening..."
fi
}
for var in "$@"
do
if [ "$var" == "-D" ] ; then
hw_bool=1
elif [ "$var" == "-d" ] ; then
dur_bool=1
elif [ "$var" == "-l" ] ; then
lang_bool=1
elif [ $hw_bool == 1 ] ; then
hw_bool=0
hardware="$var"
elif [ $dur_bool == 1 ] ; then
dur_bool=0
duration="$var"
elif [ $lang_bool == 1 ] ; then
lang_bool=0
lang="$var"
else
echo "[speech-recog]: Invalid option, valid options are -D for hardware and -d for duration"
fi
done
CheckFile() {
LineCount=`cat SpeechLog.txt | wc -l`
if [ $LineCount -gt 1 ]
then
sudo rm /dev/shm/out.flac
sudo python3 VoiceMain.py
fi
}
clear
echo "[speech-recog]: Speech recognition initialized"
echo "[speech-recog]: Listening..."
while :
do
trap CTRLc INT
echo "[speech-recog]: Recording"
(arecord -D $hardware -q -f S16_LE -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; curl -s -X POST$
sleep $sleepduration
echo "[speech-recog]: Recording"
(arecord -D $hardware -q -f S16_LE -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; curl -s -X POST$
sleep $sleepduration
done