'Audio data must be audio data' error with google speech recognition in python

Question

I am trying to load an audio file in python and process it with google speech recognition

The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data

I dont understand how it's possible to convert from one data type to another in python

The code in question is below,

import speech_recognition as spr 
import librosa

audio, sr = librosa.load('sample_data/metal.mp3')

# create a speech recognition object 
r = spr.Recognizer() 

r.recognize_google(audio)

The error is:

audio_data must be audio data

How do I convert the audio object to be used in google speech recognition

Srinivas · Answer 1 · 2021-09-01T13:59:46.983

@Mich, I hope you have found a solution by now. If not, please try the below.

First, convert the .mp3 format to .wav format using other methods as a pre-process step.

import speech_recognition as sr

# Create an instance of the Recognizer class
recognizer = sr.Recognizer()

# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)

# Create audio data
with audio_ex as source:
    audiodata = recognizer.record(audio_ex)
type(audiodata)

# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')

print(text)

You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages

Additionally you can set the minimum threshold for the loudness of the audio using below command.

recognizer.set_threshold = 300 # min threshold set to 300

Nikolay Shmyrev · Answer 2 · 2020-04-03T15:41:17.037

0

Librosa returns numpy array, you need to convert it back to wav. Something like this:

 raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()

You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.

edited Apr 03 '20 at 15:41

answered Mar 27 '20 at 07:46

Nikolay Shmyrev

24,897
5
43
87

What is data and what is audio in this answer? – MattSt Apr 03 '20 at 13:37

score 0 · Answer 3 · answered Apr 04 '21 at 14:06

0

Try this with speech recognizer:

import speech_recognition as spr 

with spr.WavFile('sample_data/metal.mp3') as source:     
     audio = r.record(source)  

r = spr.Recognizer() 
r.recognize_google(audio)

answered Apr 04 '21 at 14:06

erptocoding

305
3
5

ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format – keramat Jan 30 '22 at 07:13

'Audio data must be audio data' error with google speech recognition in python

3 Answers3

Linked