0

I'm using the speech recognition Python library to record audio bytes from my microphone in mono at 16khz but I want to use the new Whisper library that accepts NumPy arrays, spectrograms, and file paths. Writing to a file takes too long so I'd like to directly convert the data to an array to pass it to Whisper.

The Thonnu
  • 3,578
  • 2
  • 8
  • 30

2 Answers2

1

Here is a solution to your problem:

Assuming your code goes like

with sr.Microphone(device_index=device_index, sample_rate=16000) as source:
    r = sr.Recognizer()
    audio = r.listen(source, timeout=None)

you need to convert the audio data (the output of your Recognizer.listen) to wave format 1

audio_data = audio.get_wav_data()

which can be converted to an array of int16 2

data_s16 = np.frombuffer(audio_data, dtype=np.int16, count=len(audio_data)//2, offset=0)

which can then be converted to an array of float32 3

float_data = data_s16.astype(np.float32, order='C') / 32768.0

which can then be processed by whisper. If there is a faster way (maybe a combination of 2 and 3), let me know.

Greetings

Datagniel
  • 13
  • 2
0

try librosa library

librosa.load(path, *, sr=22050, mono=True, offset=0.0, duration=None, dtype=<class 'numpy.float32'>, res_type='soxr_hq')

link librosa load function