I want to make a speech recognition from a wav. To do that, I have a wav that I split into multiple chunks, export them, and then use the SpeechRecognition library.
from pydub import AudioSegment
import speech_recognition as sr
r = sr.Recognizer()
for i in range(5):
audio = AudioSegment.from_wav("some_wav.wav")
audio_chunk=audio[int(i*1000):int(i*3000)]
audio_chunk.export('test.wav', format='wav')
detection = sr.AudioFile('test.wav')
with detection as source:
audio = r.record(source)
word = r.recognize_google(audio, language = 'ro-RO')
The problem is that this is not very optimal. I want to get rid of the export wav part. I want to transform the audio_chunk into bytes and then use it in the speechRecognition.AudioFile() with in-memory bytes.
Is there a way to convert the audio-segment type into bytes?