4

I am trying to load an audio file in python and process it with google speech recognition

The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data

I dont understand how it's possible to convert from one data type to another in python

The code in question is below,

import speech_recognition as spr 
import librosa

audio, sr = librosa.load('sample_data/metal.mp3')

# create a speech recognition object 
r = spr.Recognizer() 

r.recognize_google(audio)

The error is:

audio_data must be audio data

How do I convert the audio object to be used in google speech recognition

Mich
  • 3,188
  • 4
  • 37
  • 85

3 Answers3

1

@Mich, I hope you have found a solution by now. If not, please try the below.

First, convert the .mp3 format to .wav format using other methods as a pre-process step.

import speech_recognition as sr

# Create an instance of the Recognizer class
recognizer = sr.Recognizer()

# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)

# Create audio data
with audio_ex as source:
    audiodata = recognizer.record(audio_ex)
type(audiodata)

# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')

print(text)

You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages

Additionally you can set the minimum threshold for the loudness of the audio using below command.

recognizer.set_threshold = 300 # min threshold set to 300
Srinivas
  • 568
  • 1
  • 4
  • 21
0

Librosa returns numpy array, you need to convert it back to wav. Something like this:

 raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()

You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
0

Try this with speech recognizer:

import speech_recognition as spr 

with spr.WavFile('sample_data/metal.mp3') as source:     
     audio = r.record(source)  

r = spr.Recognizer() 
r.recognize_google(audio)
erptocoding
  • 305
  • 3
  • 5
  • ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format – keramat Jan 30 '22 at 07:13