Questions tagged [mfcc]

Mel-Frequency Cepstral Coefficients. The name given to an alternate representation of speech signals based on its frequency content. Very popular way to represent a speech signal as a feature vector. Used primarily for speech recognition tasks.

Mel Frequency Cepstral Coefficients (MFCC) are coefficients obtained when a speech signal is analysed by a series of filter banks with logarithmically spaced center frequencies on the Mel-scale. This choice of center frequencies is significant because it mimics the human ear. MFCC are computed from the magnitude mel-spectrogram by log-scaling, and applying the Discrete Cosine Transform to compute the cepstrum. MFCC is very popular for speech recognition tasks.

312 questions
4
votes
1 answer

How to use MFCC vectors for classifying a single audio file?

This is probably very silly question, but I couldn't find details anywhere. So I have an audio recording (wav file) that is 3 seconds long. That is my sample and it needs to be classified as [class_A] or [class_B]. By following some tutroial on…
nnyjoh
  • 43
  • 1
  • 4
4
votes
2 answers

Library to train GMMs from MFCC

I am trying to build a basic Emotion detector from speech using MFCCs, their deltas and delta-deltas. A number of papers talk about getting a good accuracy by training GMMs on these features. I cannot seem to find a ready made package to do the…
4
votes
1 answer

how to use mfcc feature to train a svm classifier for voice recognition?

I am currently in the discussion phase project with voice recognition, I use the MFCC feature extraction, but the MFCC feature returned from the function is a matrix, e,g. a (20,38) feature matrix for each voice file(wav). But how can I pass this…
user1423164
  • 43
  • 1
  • 5
3
votes
1 answer

Understanding MFCC output for a simple sine wave

I generate a simple sine wave with a frequency of 200 and calculate an FFT to check that the obtained frequency is correct. Then I calculate MFCC but do not understand what its output means? What is the explanation of the output, and where do I see…
codeDom
  • 1,623
  • 18
  • 54
3
votes
4 answers

Matching two series of Mfcc coefficients

I have extracted two series MFCC coefficients from two around 30 second audio files consisting of the same speech content. The audio files are recorded at the same location from different sources. An estimation should be made whether the audio…
Sney
  • 2,486
  • 4
  • 32
  • 48
3
votes
1 answer

Get timing information from MFCC generated with librosa.feature.mfcc

I am extracting MFCCs from an audio file using Librosa's function (librosa.feature.mfcc) and I correctly get back a numpy array with the shape I was expecting: 13 MFCCs values for the entire length of the audio file which is 1292 windows (in 30…
GiulioG
  • 369
  • 4
  • 15
3
votes
0 answers

sklearn.exceptions.NotFittedError: This LabelEncoder instance is not fitted yet

I'm trying to run a voice recognition code from Github HERE that analyzes voice. There is an example in final_results_gender_test.ipynb that illustrates the steps both on the training and inference. So I copied and adjusted the inference part and…
Tina J
  • 4,983
  • 13
  • 59
  • 125
3
votes
1 answer

python tensorflow signal processing MFCC features

I'm testing the MFCC feature from tensorflow.signal implementation. According to the example (https://www.tensorflow.org/api_docs/python/tf/signal/mfccs_from_log_mel_spectrograms), it is computing all 80 mfccs and then taking the first 13. I have…
TYZ
  • 8,466
  • 5
  • 29
  • 60
3
votes
1 answer

Standarize a 3D NumPy array that has been padded with np.nan

I have a 3D matrix with a shape like (100, 40, 170). This matrix has been padded to reach the max length of 170 by filling up with np.nan (NaN). The values in the matrix represent MFCC coefficients from audio data extracted from the UrbanSound8K…
Eduardo G.R.
  • 377
  • 3
  • 18
3
votes
1 answer

What is the warning 'Empty filters detected in mel frequency basis. ' about?

I'm trying to extract MFCC features from an audio file with 13 MFCCs with the below code: import librosa as l x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000) n_fft = int(sr * 0.02) hop_length = n_fft // 2 mfccs…
3
votes
2 answers

How to get GFCC instead of MFCC in python?

Today i'm using MFCC from librosa in python with the code below. It gives an array with dimension(40,40). import librosa sound_clip, s = librosa.load(filename.wav) mfcc=librosa.feature.mfcc(sound_clip, n_mfcc=40, n_mels=60) Is there a similiar…
gynther
  • 69
  • 1
  • 9
3
votes
2 answers

Feature Extraction using MFCC

I want to know, how to extract the audio (x.wav) signal, feature extraction using MFCC? I know the steps of the audio feature extraction using MFCC. I want to know the fine coding in Python using the Django framework
Senthuja
  • 520
  • 1
  • 7
  • 19
3
votes
1 answer

librosa.util.exceptions.ParameterError: Invalid shape for monophonic audio: ndim=2, shape=(1025, 5341)

I am trying to separate voice from background noise in audio file using python and then extract mfcc features but I get "librosa.util.exceptions.ParameterError: Invalid shape for monophonic audio: ndim=2, shape=(1025, 5341) " error here's the…
3
votes
1 answer

TypeError: 'module' object is not callable . MFCC

Working on a project based on speaker recognition using python and getting the following error while finding MFCC. Traceback (most recent call last): File "neh1.py", line 10, in complexSpectrum = numpy.fft(signal) TypeError: 'module'…
Neha
  • 43
  • 1
  • 1
  • 5
3
votes
1 answer

What are MFCC values?

So I know what is MFCC (Mel Frequency Cepstrum Coefficients). But I need to understand what each value is... Is it some sort of sound frequency value or what? Let's assume we have this kind of matrix. So each row represents the coefficients of one…
Nikas Žalias
  • 1,594
  • 1
  • 23
  • 51
1 2
3
20 21