I have an array of floats that is output by the librosa library load function that I would like to convert to an AudioData needed by google SpeechRecognition library
The data formats are shown below
Side note: these are read from comments in library files, I cannot find strict definitions like in C
Librosa.load returns
Returns
-------
y : np.ndarray [shape=(n,) or (2, n)]
audio time series
SpeechRecognition.AudioData class takes frame_data which is a string of bytes Im understanding
The raw audio data is specified by ``frame_data``, which is a sequence of bytes representing audio samples. This is the frame data structure used by the PCM WAV format.
How can In convert from ndaarray of floats
to sequence of bytes
in python
In Visual Studio Code, audio
as returned by librosa is
dtype:dtype('float32')
alignment:4
base:dtype('float32')
byteorder:'='
char:'f'
descr:[('', '<f4')]
fields:None
flags:0
hasobject:False
isalignedstruct:False
isbuiltin:1
isnative:True
itemsize:4
kind:'f'
metadata:None
name:'float32'
names:None
The full code Im trying to run is:
# importing libraries
import speech_recognition as spr
import librosa
audio, sr = librosa.load('sample_data.mp3')
print(audio)
# create a speech recognition object
#conversion needed here
audio2 = spr.AudioData()
r = spr.Recognizer()
r.recognize_google(audio2)
Thanks,