11

SHORT AND SIMPLE: What are the steps that are involved to get an MFCC from an FFT.

DETAILED:

I'm working on a drum application to classify sounds. Its a matching application for the iPhone with the openframeworks library for sound processing, the idea is to return the name of the note that you play on the loud Indian drum (known as the Dhol) - only a few notes are playable.

I've implemented the FFT algorithm and successfully obtain a spectrum. I now want to take it one step further and return the mfcc from the fft.

This is what I understand so far. Its based on linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.

It uses triangulation to filter out the frequencies and get a desired coefficient. http://instruct1.cit.cornell.edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png

So if you have around 1000 values returned from the fft algorithm - the spectrum of the sound, then desirably you'll get around 12 elements (i.e., coefficients). This 12-element vector is used to classify the instrument, including the drum played...

This is all I'm trying to achieve.

Could someone please help me on how to do something like this? Any help would be greatly appreciated. Cheers

Pavan
  • 17,840
  • 8
  • 59
  • 100
  • 3
    Generally I am loathe to cite Wikipedia for anything technical, but doesn't [this page](http://en.wikipedia.org/wiki/Mel-frequency_cepstral_coefficient) basically give you the steps to obtain the coefficients? – Dan Apr 30 '11 at 16:18

1 Answers1

24

First, you have to split the signal in small frames with 10 to 30ms, apply a windowing function (humming is recommended for sound applications), and compute the fourier transform of the signal. With DFT, to compute Mel Frequecy Cepstral Coefficients you have to follow these steps:

  1. Get power spectrum: |DFT|^2
  2. Compute a triangular bank filter to transform hz scale into mel scale
  3. Get log spectrum
  4. Apply discrete cossine transform

A python code example:

import numpy
from scipy.fftpack import dct
from scipy.io import wavfile

sampleRate, signal = wavfile.read("file.wav")
numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000  

complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2)  # MFCC :)

def melFilterBank(blockSize):
    numBands = int(numCoefficients)
    maxMel = int(freqToMel(maxHz))
    minMel = int(freqToMel(minHz))

    # Create a matrix for triangular filters, one row per filter
    filterMatrix = numpy.zeros((numBands, blockSize))

    melRange = numpy.array(xrange(numBands + 2))

    melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel

    # each array index represent the center of each triangular filter
    aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0
    aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050
    aux = 0.5 + 700 * blockSize * aux
    aux = numpy.floor(aux)  # Arredonda pra baixo
    centerIndex = numpy.array(aux, int)  # Get int values

    for i in xrange(numBands):
        start, centre, end = centerIndex[i:i + 3]
        k1 = numpy.float32(centre - start)
        k2 = numpy.float32(end - centre)
        up = (numpy.array(xrange(start, centre)) - start) / k1
        down = (end - numpy.array(xrange(centre, end))) / k2

        filterMatrix[i][start:centre] = up
        filterMatrix[i][centre:end] = down

    return filterMatrix.transpose()

def freqToMel(freq):
    return 1127.01048 * math.log(1 + freq / 700.0)

def melToFreq(mel):
    return 700 * (math.exp(mel / 1127.01048) - 1)

This code is based on MFCC Vamp example. I hope this help you!

alfakini
  • 4,635
  • 2
  • 26
  • 35
  • Hi, Do you mean for "file.wav" to be a frame (10ms to 30ms)? If not, you need to split signal into small frames and then apply the operations you did to each frame. For each frame, you should get out 13 coefficients. – engineerchuan May 03 '11 at 03:36
  • ... I was confused with that too. I assumed he was talking about the size of the window. It's Where we grab the values and then compute the FFT on it. Please confirm – Pavan May 03 '11 at 07:06
  • but what happens once i have the coefficients? what do it do with them? im assuming i get the coefficients of sound one and then the coefficients of sound 2... then what – Pavan May 03 '11 at 23:34
  • 1
    Sorry! I extracted it from my research code, then i forgot do made it clear. Consider "file.wav" as a sound frame with 10ms to 30ms! I think just have the coefficients isn't enough. You need to pass MFCC for an algorithm to classify it. I'm using a back-propagation neural network here to classify percussive sounds. An interesting project that uses MFCC to classify drum and other percursive sounds is [william brandt's timbreID](http://williambrent.conflations.com/pages/research.html) – alfakini May 05 '11 at 04:52
  • 1
    there's a bug in the melToFreq() function, the -1 should be outside the inner parentheses – jytoronto Sep 06 '15 at 03:07