Questions tagged [mfcc]

Mel-Frequency Cepstral Coefficients. The name given to an alternate representation of speech signals based on its frequency content. Very popular way to represent a speech signal as a feature vector. Used primarily for speech recognition tasks.

Mel Frequency Cepstral Coefficients (MFCC) are coefficients obtained when a speech signal is analysed by a series of filter banks with logarithmically spaced center frequencies on the Mel-scale. This choice of center frequencies is significant because it mimics the human ear. MFCC are computed from the magnitude mel-spectrogram by log-scaling, and applying the Discrete Cosine Transform to compute the cepstrum. MFCC is very popular for speech recognition tasks.

312 questions
3
votes
1 answer

Spectrograms generated using Librosa don't look consistent with Kaldi?

I generated spectrogram of a "seven" utterance using the "egs/tidigits" code from Kaldi, using 23 bins, 20kHz sampling rate, 25ms window, and 10ms shift. Spectrogram appears as below visualized via MATLAB imagesc function: I am experimenting with…
kashkar
  • 663
  • 1
  • 8
  • 22
3
votes
0 answers

Meaning of MFCC

I have a conceptual problem. I know what is a mel scale and what it represent and I know that this kind of spectrogram still has too much information for what I need. I think that if we want reduce the number of information of the spectrogram we use…
Anthos89
  • 87
  • 1
  • 1
  • 8
3
votes
1 answer

Train speech HMM from MFCC with Matlab hmmtrain

I read many articles on this but I just do not understand how I have to proceed. I'm trying to build a basic Speech recognition system using the MFCC features to the HMM , I'm using the data available here. I'm using Matlab to do this. So far I…
3
votes
2 answers

Fastest method of MFCC extraction on linux machine

What is the fastest way of extracting mfcc from audio files in linux (Raspberry Pi in my case). I tried sphinx3 but it was slow for large files (on Raspberry Pi). SFS (speech filing system) was quite fast on windows but i could not install it on…
3
votes
3 answers

Use libxtract or other small C, C++ library for VAD functionality

I try to create speaker identification system on Android. Currently I'm using libxtract to calculate MFCC vector from frames and libsvm for classify. Do you have any idea how to use libxtract or other small C, C++ library that I can compile under…
Jack
  • 255
  • 1
  • 3
  • 14
3
votes
1 answer

Is there any MFCC library can be used in android?

My team is making a emotion-recognition in speech app. To get mfcc, we use comirva package. The problem is that AudioInputStream needed to create AudioPreProcessor can't be used in android. So we have been finding some kind of alternative. Is there…
joejo
  • 111
  • 11
2
votes
0 answers

Why does applying the hamming window to framed data show a consistent difference in behavior between python and C?

This is the code I wrote in python that extracts data from a .wav file, applies pre-emphasis, divide into frames of 0.025ms with 0.010 stride, and applies a hamming window: import scipy.io.wavfile as wavfile import numpy as np samplerate, data =…
FloopyBeep
  • 21
  • 1
2
votes
0 answers

Mel-spectrogram vs MFCC for Automatic Speech Recognition

I am trying to do Automatic Speech Recognition using CNN. For the feature extraction I am using MFCC. I have read many articles, some of them say with lot of data and classifiers like CNN, mel spectorgram are better while others say MFCC is…
2
votes
1 answer

Is my output of librosa MFCC correct? I think I get the wrong number of frames when using librosa MFCC

result=librosa.feature.mfcc(signal, 16000, n_mfcc=13, n_fft=2048, hop_length=400) result.shape() The signal is 1 second long with sampling rate of 16000, I compute 13 MFCC with 400 hop length. The output dimensions are (13,41). Why do I get 41…
Rasula
  • 47
  • 1
  • 5
2
votes
0 answers

MFCC Normalization for DL

I have a dataset containing MFCC features as input for a deep learning model. Now when I look at my mfccs they have large varying ranges of values (e.g. (-100,200),(0,5),(-1,1),...). Now I would like to normalize them for my model to be suited for…
Flitschi
  • 73
  • 5
2
votes
1 answer

What are the components of the Mel mfcc

In looking at the output of this line of code: mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc=40) print("MFCC Shape = ", mfccs.shape) I get a response of MFCC Shape = (40,1876). What do these two numbers represent? I…
Joe
  • 357
  • 2
  • 10
  • 32
2
votes
2 answers

Relation between hop_length, win_length, frame_length, n_fft, no.of frames

I am working with mfcc features in Python via librosa: mfccs = librosa.feature.mfcc(y=y,sr=sr,n_mfcc=12,n_fft=320,hop_length=320,htk=True) Here, I took audio signal of 1s duration which gave me len(y) = 16000, hence I took sr = 16000. I calculated…
Pranaswi Reddy
  • 71
  • 1
  • 1
  • 2
2
votes
1 answer

Librosa's inverse mel spectrogram to stft taking a long time

I am currently trying to convert a mel spectrogram back into an audio file, however, librosa's mel_to_stft function is taking a long time (upwards to 15 minutes) to read in a 30 second .wav file sampled at 384kHz. The following is my code: # Code…
Sam
  • 43
  • 6
2
votes
1 answer

What is the second number in the MFCCs array?

When I extract MFCCs from an audio the ouput is (13, 22). What does the number represent? Is it time frames ? I use librosa. The code is use is: mfccs = librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13,…
ioan_bl
  • 35
  • 8
2
votes
2 answers

What are the differences between MFCC and BFCC?

I have implemented MFCC algorithm and want to implement BFCC. What are the differences between them and is it enough just to use another function instead of frequency to mel (2595 * Math.log10(1 + frequency / 700) ) and mel to frequency functions…
kamaci
  • 72,915
  • 69
  • 228
  • 366