2

I want to make a speech recognition from a wav. To do that, I have a wav that I split into multiple chunks, export them, and then use the SpeechRecognition library.

from pydub import AudioSegment
import speech_recognition as sr

r = sr.Recognizer()
for i in range(5):
    audio = AudioSegment.from_wav("some_wav.wav")
    audio_chunk=audio[int(i*1000):int(i*3000)]
    audio_chunk.export('test.wav', format='wav')
    detection = sr.AudioFile('test.wav')

    with detection as source:
        audio = r.record(source)

    word = r.recognize_google(audio, language = 'ro-RO')

The problem is that this is not very optimal. I want to get rid of the export wav part. I want to transform the audio_chunk into bytes and then use it in the speechRecognition.AudioFile() with in-memory bytes.

Is there a way to convert the audio-segment type into bytes?

TheGainadl
  • 523
  • 1
  • 6
  • 14

0 Answers0