currently I am working with a speech to text translation model that takes a .wav file and turns the audible speech within the audio into a text transcript. The model worked before on .wav audio recordings that were recorded directly. However now I am trying to do the same with audio that was at first present within a video.
The steps are as follows:
- retrieve a video file from a stream url through ffmpeg
- strip the .aac audio from the video
- convert the .aac audio to .wav
- save the .wav to s3 for later usage
The ffmpeg command I use is listed below for reference:
rm /tmp/jonas/*
ffmpeg -i {stream_url} -c copy -bsf:a aac_adtstoasc /tmp/jonas/{filename}.aac
ffmpeg -i /tmp/jonas/{filename}.aac /tmp/jonas/{filename}.wav
aws s3 cp /tmp/jonas/{filename}.wav {s3_audio_save_location}
The problem now is that my speech to text model does not work on this audio anymore. I use sox to convert the audio but sox does not seem to grab the audio. Also without sox the model does not work. This leads me to believe there is a difference in the .wav audio formatting and therefore I would like to know how I can either format the .wav with the same settings as a .wav that does work or find a way to compare the .wav audio formatting and set the new .wav to the correct format manually through ffmpeg
I tried with PyPy exiftool and found the metadata of the two files:
The metadata of the working .wav file is
The metadata of the .wav file that does not work is
So as can be seen the working .wav file has some different settings that I would like to mimic in the second .wav file presumably that would make my model work again :)
with kind regards, Jonas