1

I have an array of floats that is output by the librosa library load function that I would like to convert to an AudioData needed by google SpeechRecognition library

The data formats are shown below

Side note: these are read from comments in library files, I cannot find strict definitions like in C

Librosa.load returns

Returns
-------
y    : np.ndarray [shape=(n,) or (2, n)]
    audio time series

SpeechRecognition.AudioData class takes frame_data which is a string of bytes Im understanding

The raw audio data is specified by ``frame_data``, which is a sequence of bytes representing audio samples. This is the frame data structure used by the PCM WAV format.

How can In convert from ndaarray of floats to sequence of bytes in python

In Visual Studio Code, audio as returned by librosa is

dtype:dtype('float32')
alignment:4
base:dtype('float32')
byteorder:'='
char:'f'
descr:[('', '<f4')]
fields:None
flags:0
hasobject:False
isalignedstruct:False
isbuiltin:1
isnative:True
itemsize:4
kind:'f'
metadata:None
name:'float32'
names:None

The full code Im trying to run is:

# importing libraries 
import speech_recognition as spr 
import librosa

audio, sr = librosa.load('sample_data.mp3')

print(audio)

# create a speech recognition object 

#conversion needed here
audio2 = spr.AudioData() 

r = spr.Recognizer() 
r.recognize_google(audio2)

Thanks,

Mich
  • 3,188
  • 4
  • 37
  • 85
  • You asked the same question, right https://stackoverflow.com/questions/60879469/audio-data-must-be-audio-data-error-with-google-speech-recognition-in-python – Nikolay Shmyrev Mar 27 '20 at 22:54
  • Does this answer your question? ['Audio data must be audio data' error with google speech recognition in python](https://stackoverflow.com/questions/60879469/audio-data-must-be-audio-data-error-with-google-speech-recognition-in-python) – Jon Nordby Mar 29 '20 at 17:42

0 Answers0