What I'm trying to do
I'm trying to transcribe Telegram audio messages, using Mozillas speech-to-text engine deepspeech.
Using *.wav
in 16bit 16khz works flawless.
I want to add *.ogg
opus support, since Telegram uses this format for it's audio messages.
What I have tried so far
I have tried pyogg and soundfile so far, with no luck.
Soundfile could outright not read the opus format and pyogg is a pain to install without conda. I had really weird moments where it literally crashed python.
Right now, I'm trying librosa with mixed results.
data, sample_rate = librosa.load(path)
tmp = np.array(data, np.float16)
tmp.dtype = np.int16
int16 = np.array(tmp, dtype=np.int16)
metadata = model.sttWithMetadata(int16)
Deepspeech really likes np.int16
. model.sttWithMetadata
is essentially the call for the transcriber.
Right now, it does transcribe something, but nowhere near anything resembling what I speak in my audio message.