Questions tagged [mfcc]

Mel-Frequency Cepstral Coefficients. The name given to an alternate representation of speech signals based on its frequency content. Very popular way to represent a speech signal as a feature vector. Used primarily for speech recognition tasks.

Mel Frequency Cepstral Coefficients (MFCC) are coefficients obtained when a speech signal is analysed by a series of filter banks with logarithmically spaced center frequencies on the Mel-scale. This choice of center frequencies is significant because it mimics the human ear. MFCC are computed from the magnitude mel-spectrogram by log-scaling, and applying the Discrete Cosine Transform to compute the cepstrum. MFCC is very popular for speech recognition tasks.

312 questions
6
votes
1 answer

How can I obtain the raw audio frames from the microphone in real-time or from a saved audio file in iOS?

I am trying to extract MFCC vectors from the audio signal as input into a recurrent neural network. However, I am having trouble figuring out how to obtain the raw audio frames in Swift using Core Audio. Presumably, I have to go low-level to get…
macklinagent
  • 75
  • 1
  • 6
5
votes
1 answer

Librosa: Cannot provide window function for mfcc on Windows

I'm currently experimenting with librosa to reproduce an scientific approach (deep learning) that used PRAAT to extract the MFCCs of audio files. I'm not that experienced with phonetics/acoustics and I had a lot of issues understanding PRAAT - so I…
Keanri
  • 96
  • 9
5
votes
2 answers

Comparing MFCC feature vectors with DTW

I'm looking for some advice on Dynamic Time Warping (DTW). I have a Python script and extract Mel-Frequency Cepstral Coefficient (MFCC) feature vectors from .WAV files of various lengths. The feature vectors are arrays of varying lengths that…
amitchone
  • 1,630
  • 3
  • 21
  • 45
5
votes
2 answers

How to get MFCC with TarsosDSP?

I searched everywhere and I couldn't figure out how to extract MFCC feature using TarsosDSP on Android. I know how to get FFT out of a file. Any help?
Hassan Pezeshk
  • 343
  • 5
  • 16
5
votes
4 answers

Librosa mel filter bank decreasing triangles

I'm a bit stuck understanding MFCCs. From what I have read the mel filter banks should be a series of triangles that get wider and their peaks are at the same place. Like this... However when I compute the mel filter banks using librosa I…
Jack Deadman
  • 166
  • 2
  • 8
5
votes
2 answers

How to compare two MFCC feature vector or similarity between the MFCC feature vector of two speech utterances

I have extracted 13 MFCC features of two utterances. Feature set for first utterance is of size 11*13 and other is 18*13. So, how to compare two feature sets to find the similarity between these two words? I am not using any classifier, if someone…
5
votes
1 answer

MATLAB code for calculating MFCC

I have a question if that's ok. I was recently looking for algorithm to calculate MFCCs. I found a good tutorial rather than code so I tried to code it by myself. I still feel like I am missing one thing. In the code below I took FFT of a signal,…
Celdor
  • 2,437
  • 2
  • 23
  • 44
4
votes
1 answer

How to train HMM with audio senteces dataset for speech recognition?

I have read some journals and paper of HMM and MFCC but i still got confused on how it works step by step with my dataset (audio of sentences dataset). My data set Example (Audio Form) : hello good morning good luck for you exam etc about 343 audio…
4
votes
1 answer

How to combine mfcc vector with labels from annotation to pass to a neural network

Using librosa, I created mfcc for my audio file as follows: import librosa y, sr = librosa.load('myfile.wav') print y print sr mfcc=librosa.feature.mfcc(y=y, sr=sr) I also have a text file that contains manual annotations[start, stop, tag]…
DJ_Stuffy_K
  • 615
  • 2
  • 11
  • 29
4
votes
1 answer

Python implementation of MFCC algorithm

I have a database which contains a videos streaming. I want to calculate the LBP features from images and MFCC audio and for every frame in the video I have some annotation. The annotation is inlined with the video frames and the time of the video.…
konstantin
  • 853
  • 4
  • 16
  • 50
4
votes
5 answers

is it possible to get exactly the same results from tensorflow mfcc and librosa mfcc?

I'm trying to make tensorflow mfcc give me the same results as python lybrosa mfcc i have tried to match all the default parameters that are used by librosa in my tensorflow code and got a different result this is the tensorflow code that i have…
Eli Leszczynski
  • 145
  • 1
  • 7
4
votes
1 answer

How does mfcc feature size affect recurent neural network

So I'm learning machine learning and wanted to know how does mfcc feature size affect on RNN (Recurent Neural Network)? With librosa I extracted mfcc and then delta coefficients and after that I get array of dimension [13, sound_length] The code of…
4
votes
1 answer

Python audio signal classification MFCC features neural network

I am trying to classify audio signals from speech to emotions. For this purpose I am extracting MFCC features of the audio signal and feed them into a simple neural network (FeedForwardNetwork trained with BackpropTrainer from PyBrain).…
cowhi
  • 2,165
  • 2
  • 17
  • 21
4
votes
2 answers

How to get mfcc features with octave

My goal is to create program on octave that loads audio file (wav, flac), calculates its mfcc features and serve them as output. The problem is that I do not have much experience with octave and cannot get octave load the audio file and that is why…
4
votes
1 answer

How to Extract MFCC features in Java

I am working on converting a speech recognition project from MATLAB to Java code. I have been able to read the .wav files (as vectors of values in the range -1 to 1) using the java example provided here. This works exactly as the wavread function in…
Tryxo
  • 41
  • 1
  • 3
1
2
3
20 21