Questions tagged [mfcc]

Mel-Frequency Cepstral Coefficients. The name given to an alternate representation of speech signals based on its frequency content. Very popular way to represent a speech signal as a feature vector. Used primarily for speech recognition tasks.

Mel Frequency Cepstral Coefficients (MFCC) are coefficients obtained when a speech signal is analysed by a series of filter banks with logarithmically spaced center frequencies on the Mel-scale. This choice of center frequencies is significant because it mimics the human ear. MFCC are computed from the magnitude mel-spectrogram by log-scaling, and applying the Discrete Cosine Transform to compute the cepstrum. MFCC is very popular for speech recognition tasks.

312 questions
2
votes
1 answer

MFCC with Java Linear and Logarithmic Filters

I am implementing MFCC algorithm with Java. There is a sample code for triangular filters and MFCC at Java. Here is the link: MFCC Java However I should follow that code written in Matlab: MFCC Matlab My question is that at Matlab code it talks…
kamaci
  • 72,915
  • 69
  • 228
  • 366
2
votes
0 answers

How to fix broken data in feature extraction/pre-processing in speech recognition?

i am very new in machine learning. I stumble on this source code on github that has no database, so i decided to use my own database. This code is to recognize speaker with MFCC and GMM-UBM. But when i try to run the code, i got this error…
lorita
  • 21
  • 2
2
votes
1 answer

How to save arrays in a .npz structure compatible with FBK Fairseq for Direct Speech Translation?

I generated a npz folder with numpy with the code np.savez(outpath + "/data.npz", **keywords) where keywords is a dictionary structured as: "0" : array "1" : array Each array is a 2D array containing MFCC features extracted with speechpy. For…
gdc
  • 21
  • 5
2
votes
1 answer

Preparing MFCC audio feature- Should all WAV files be at same length?

I would like to prepare an Audio-dataset for a machine learning model. Each .wav file should be represented as an MFCC image. While all of the images will have the same MFCC amount (= 20), the lengths of the .wav files are between 3-5…
21kc
  • 23
  • 5
2
votes
0 answers

How to correctly unpickle a file (ModuleNotFoundError)?

I saved a model using Pickle using this code below: picklefile = path.split("-")[0]+".gmm" Pickle.dump(gmm,open(dest + picklefile,'w')) print '+ modeling completed for person:',picklefile," with data point = ",list_features.shape list_features =…
2
votes
1 answer

MFCC feature extraction, Librosa

I want to extract mfcc features of an audio file sampled at 8000 Hz with the frame size of 20 ms and of 10 ms overlap. What must be the parameters for librosa.feature.mfcc() function. Does the code written below specify 20ms chunks with 10ms…
2
votes
0 answers

What method does Librosa use to calculate Delta-MFCC?

I am trying to generate the delta-MFCCs. Apparently there are several implementations. I found the "regression" formula link here. But I don't understand why Librosa uses Savitsky-Golay filter, which is a smoothing filter. I have not found any…
Satashree Roy
  • 365
  • 2
  • 9
2
votes
1 answer

How to make 3 dimensional array for CNN input python

I am trying to learn cnn network to recognize emotion in speech. For this I am using the mel-ceptral coefficients (mfcc) which represent each audio file as two dimensional array (number of frames * number of mfcc coefficients). I want to have a…
2
votes
1 answer

normalizing mel spectrogram to unit peak amplitude?

I am new to both python and librosa. I am trying to follow this method for a speech recognizer: acoustic front end My code: import librosa import librosa.display import numpy as np y, sr = librosa.load('test.wav', sr = None) normalizedy =…
sabri
  • 23
  • 1
  • 8
2
votes
0 answers

GMM and MFCC for language identification

I am new to machine learning domain. Currently, I am trying to implement a audio language detection system, based on MFCC, delta, delta delta and Mel Spectrum Coefficients of any audio file. These features are extracted using librosa. Librosa…
Amit K.S
  • 21
  • 2
2
votes
2 answers

Transition between Audiosegment object and a wave file/data

I am extracting MFCC features from mp3 voice files but I do want to keep the source files unchangeable and without adding any new files. My processing includes the following steps: Load .mp3 file, eliminate silence, and generate .wav data using…
SuperKogito
  • 2,998
  • 3
  • 16
  • 37
2
votes
1 answer

AttributeError: 'Series' object has no attribute 'label'

I'm trying to follow a tutorial on sound classification in neural networks, and I've found 3 different versions of the same tutorial, all of which work, but they all reach a snag at this point in the code, where I get the "AttributeError: 'Series'…
ZeLobster
  • 33
  • 1
  • 6
2
votes
0 answers

How to use MFCC TarsosDSP with microphone in android

in this example (answer): How to get MFCC with TarsosDSP? they show how to use MFCC in android @Test from float array, Im trying to use it with data from microphone : int sampleRate = 44100; int bufferSize = 8192; int bufferOverlap =…
2
votes
1 answer

generate mfcc's for audio segments based on annotated file

My main goal is in feeding mfcc features to an ANN. However I am stuck at the data pre processing step and my question has two parts. BACKGROUND : I have an audio. I have a txt file that has the annotation and time stamp like this: 0.0 2.5…
kRazzy R
  • 1,561
  • 1
  • 16
  • 44
2
votes
1 answer

ValueError: could not broadcast input array from shape (20,590) into shape (20)

I am trying to extract features from .wav files by using MFCC's of the sound files. I am getting an error when I try to convert my list of MFCC's to a numpy array. I am quite sure that this error is occurring because the list contains MFCC values…
Sreehari R
  • 919
  • 4
  • 11
  • 21