Understanding Formants from Audio Signal

Question

I went through the Matlab tutorial on Formant Estimation using LPC Coefficients. Though I vaguely understand the details, it's not entirely clear why we need to do this. From http://person2.sol.lu.se/SidneyWood/praate/whatform.html:

A formant is a concentration of acoustic energy around a particular frequency in the speech wave

Why is it not enough to take the DFT of the audio signal (after some pre-processing if necessary)? In the frequency-domain, the peaks correspond to these concentrations, correct?

There is http://dsp.stackexchange.com for such kind of questions. — Nikolay Shmyrev, Oct 22 '16 at 19:34

score 1 · Answer 1 · answered Aug 17 '17 at 17:12

A bit late to the game but for anyone stumbling upon this.

The DFT or FFT showing the frequency response of the speech signal represents the harmonics of the vocal tract.

The formant algorithm you mention by Roy Snell uses an auto-correlation method to find the resonances of the vocal tract that represent the formants. This is basically a smoothed FFT/DFT where the peaks represent the formants. There are a lot of variables that can go into the LPC calculation which will change the effect of the smoothing.

Google Scholar has a lot of research on the field as well as books, i'd recommend Acoustic & Auditory Phonetics, helped me understand it all a bit better and it's not a tough read!

Understanding Formants from Audio Signal

1 Answers1