0

I am developing a back-end speech recognition software wherein the user can import mp3 files. How can I extract the features from this digital audio file? should I convert it back to analog first?

Allen Pol
  • 51
  • 1
  • 6

2 Answers2

2

Your question is unclear, since you are using terms analog and digital incorrectly. Analog is a real-world, continuous function, i.e. voltage, pressure, etc. Digital is a discrete (sampled) and quantized version of the analog signal. You must calculate the FFT of your audio frames when calculating the MFCC's. You can extract MFCC's only from the digital signal - it's rather impossible to do it with the analog one.

If you are asking about whether it is possible to extract the MFCC's from an mp3 file, then yes - it is possible. All you need is to perform the standard algorithm and you can get your features - obviously it is outside of spec of that question.

  1. Calculate the FFT for frames of data.
  2. Calculate the PSD by squaring the samples.
  3. Apply the mel-filterbank and sum the energy across banks.
  4. Calculate the logarithm of each of the energies.
  5. Calculate the DCT of the logarithms of energies.
jojeck
  • 935
  • 9
  • 29
  • So does that mean I don't have to go through spectral analysis anymore? I'm new in this field and most sources I've found about MFCC process includes converting analog signal to digital signals. – Allen Pol May 27 '15 at 01:38
  • 2
    I suggest you to read first some DSP fundamentals, chapter 1. You are using terms *analog* and *digital* incorrectly. Analog is a real-world, continuous function, i.e. voltage, pressure, etc. Digital is discretized (sampled) and quantized version of the analog signal. You **must** calculate the FFT of your audio frames when calculating the MFCC's. – jojeck May 27 '15 at 08:19
  • The MFCC is short for mel frequency cepstrum coefficients and requires the FFT (or DFT). – mmoment May 27 '15 at 08:23
0

You're confusing things here, like @jojek said you can do all that WITH the digital signal. This here is a pretty spot on tutorial:

http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/

This one is more practical:

http://www.speech.cs.cmu.edu/15-492/slides/03_mfcc.pdf

From Wikipedia: [http://en.wikipedia.org/wiki/Mel-frequency_cepstrum]

MFCCs are commonly derived as follows:[1][2]

  • Take the Fourier transform of (a windowed excerpt of) a signal. Means short time fourier transform)

  • Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows. (Calculation described in the links above)

  • Take the logs of the powers at each of the mel frequencies.

  • Take the discrete cosine transform of the list of mel log powers, as if it were a signal.

  • The MFCCs are the amplitudes of the resulting spectrum.

and here's a Matlab toolbox to help you understand it better:

http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

mmoment
  • 1,269
  • 14
  • 30