0

I'm using the excellent deepspeech package to transcribe an audio file in Python. Here's my quick implementation:

import wave
import deepspeech
import numpy as np

model_file_path = 'deepspeech-0.9.3-models.pbmm'
model = deepspeech.Model(model_file_path)
filename = 'podcast.wav'
w = wave.open(filename, 'r')
frames = w.getnframes()
buffer = w.readframes(frames)
data16 = np.frombuffer(buffer, dtype=np.int16)
text = model.stt(data16)

podcast.wav is a ~20 minute audio file. Running text = model.stt(data16) takes 10+ minutes (I interrupted the process after 10 minutes), which is unexpectedly slow given the availability of a GPU (I'm using Google Colab). I suspect that the script isn't using the GPU. Is there another implementation of the above code to ensure the use of a GPU? I can confirm that deepspeech-gpu is installed.

mmz
  • 1,011
  • 1
  • 8
  • 21

1 Answers1

2

Having only deepspeech-gpu installed should do it.

pip install deepspeech-gpu

Try uninstalling the CPU version that you might have previously installed.

pip uninstall deepspeech

You can verify this by monitoring your GPU usage. Display GPU Usage While Code is Running in Colab

Henderz
  • 31
  • 5