1

I am experiencing issues with OpenAI's whisper and faster-whisper when processing audio files. Specifically, some of the files fail to fully process and the progress bar freezes, occurring randomly across durations. I suspect this issue may be related to hardware performance, as I am using a mid-range GPU and CPU. I have attempted to fix the issue by breaking the speech recognition into smaller chunks, but the problem persists. Can you suggest any debugging steps or solutions to this issue?

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
Parzival
  • 62
  • 5
  • Please provide enough code so others can better understand or reproduce the problem. – Community Apr 30 '23 at 16:03
  • Can you please provide information on how you're using Whisper - such as in a Jupyter notebook, or on the command line? Is there anything distinct between the files where Whisper freezes and the files where it does not? How long are the audio files? – Kathy Reid May 01 '23 at 23:54

2 Answers2

1

Have you monitored your memory and GPU usage? If nvtop shows levels that are reasonable and not growing over time, then it probably isn't a resource issue.

On the other hand, if it is a GPU memory issue, you could try cleaning up the GPU memory before loading any model or processing each chunk: either delete or set any variables that have been previously returned by whisper to None, then use these two lines of code:

gc.collect()
torch.cuda.empty_cache()

(you need to import gc and torch first, obviously).

The bad news is that when I did this, although it did reduce my memory usage it didn't solve the hangs. Also, switching to CPU-only processing didn't help - instead of the whole process hanging, the call appeared to end normally but the returned transcript would be incomplete.

St. Eve
  • 11
  • 2
0

It does not seem to happen when I use CPU mode. Perhaps it's a resource constraint issue.

Parzival
  • 62
  • 5