I am experiencing issues with OpenAI's whisper and faster-whisper when processing audio files. Specifically, some of the files fail to fully process and the progress bar freezes, occurring randomly across durations. I suspect this issue may be related to hardware performance, as I am using a mid-range GPU and CPU. I have attempted to fix the issue by breaking the speech recognition into smaller chunks, but the problem persists. Can you suggest any debugging steps or solutions to this issue?
-
Please provide enough code so others can better understand or reproduce the problem. – Community Apr 30 '23 at 16:03
-
Can you please provide information on how you're using Whisper - such as in a Jupyter notebook, or on the command line? Is there anything distinct between the files where Whisper freezes and the files where it does not? How long are the audio files? – Kathy Reid May 01 '23 at 23:54
2 Answers
Have you monitored your memory and GPU usage? If nvtop shows levels that are reasonable and not growing over time, then it probably isn't a resource issue.
On the other hand, if it is a GPU memory issue, you could try cleaning up the GPU memory before loading any model or processing each chunk: either delete or set any variables that have been previously returned by whisper to None, then use these two lines of code:
gc.collect()
torch.cuda.empty_cache()
(you need to import gc and torch first, obviously).
The bad news is that when I did this, although it did reduce my memory usage it didn't solve the hangs. Also, switching to CPU-only processing didn't help - instead of the whole process hanging, the call appeared to end normally but the returned transcript would be incomplete.

- 11
- 2
It does not seem to happen when I use CPU mode. Perhaps it's a resource constraint issue.

- 62
- 5