I want to know, how to extract the audio (x.wav) signal, feature extraction using MFCC? I know the steps of the audio feature extraction using MFCC. I want to know the fine coding in Python using the Django framework
Asked
Active
Viewed 9,356 times
3
-
I am doing research on voice recognition using MFCC – Senthuja Jan 12 '19 at 13:41
2 Answers
3
This is the most important step in building a speech recognizer because after converting the speech signal into the frequency domain, we must convert it into the usable form of the feature vector.
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from python_speech_features import mfcc, logfbank
frequency_sampling, audio_signal =
wavfile.read("/home/user/Downloads/OSR_us_000_0010_8k.wav")
audio_signal = audio_signal[:15000]
features_mfcc = mfcc(audio_signal, frequency_sampling)
print('\nMFCC:\nNumber of windows =', features_mfcc.shape[0])
print('Length of each feature =', features_mfcc.shape[1])
features_mfcc = features_mfcc.T
plt.matshow(features_mfcc)
plt.title('MFCC')
filterbank_features = logfbank(audio_signal, frequency_sampling)
print('\nFilter bank:\nNumber of windows =', filterbank_features.shape[0])
print('Length of each feature =', filterbank_features.shape[1])
filterbank_features = filterbank_features.T
plt.matshow(filterbank_features)
plt.title('Filter bank')
plt.show()
or you may use this code to extract the feature
import numpy as np
from sklearn import preprocessing
import python_speech_features as mfcc
def extract_features(audio,rate):
"""extract 20 dim mfcc features from an audio, performs CMS and combines
delta to make it 40 dim feature vector"""
mfcc_feature = mfcc.mfcc(audio,rate, 0.025, 0.01,20,nfft = 1200, appendEnergy = True)
mfcc_feature = preprocessing.scale(mfcc_feature)
delta = calculate_delta(mfcc_feature)
combined = np.hstack((mfcc_feature,delta))
return combined

Senthuja
- 520
- 1
- 7
- 19
1
you can use following code to extract an audio file MFCC features using librosa package(it is easy to install and work):
import librosa
import librosa.display
audio_path = 'my_audio_file.wav'
x, sr = librosa.load(audio_path)
mfccs = librosa.feature.mfcc(x, sr=sr,n_mfcc=40)
print(mfccs.shape)
also you can Display the MFCCs using following code:
librosa.display.specshow(mfccs, sr=sr, x_axis='time')

Ali karimi
- 371
- 3
- 10
-
Code only answers are discouraged on SO, & less likely to be upvoted over time. Consider editing to add an explanation or links to documentation. Also, it's recommended to remove the word "simply" from answers. – SherylHohman Jan 17 '22 at 17:33