How to compare spoken audio against reference recording - language learning

Question

I am looking for a way to compare a user submitted audio recording against a reference recording for comparison in order to give someone a grade or percentage for language learning.

I realize that this is a very un-scientific way of doing things and is more than a gimmick than anything.

My first thoughts are some sort of audio fingerprinting, or waveform comparison.

Any ideas where I should be looking?

kqnr · Accepted Answer · 2011-04-12T00:00:08.670

This is by no means a trivial problem to solve, though there is an abundance of research on the topic. Presently the most successful forms of machine learning in the speech recognition domain apply Hidden Markov Model techniques.

You may also want to take a look at existing implementations of HMM algorithms. One such library in its early stages is ghmm.

Perhaps even better and more readily applicable to your problem is HTK.

score 2 · Answer 2 · answered Apr 13 '11 at 16:05

2

In addition to chomp's great answer, one important keyword you probably need to look up is Dynamic Time Warping (DTW). This is the wikipedia article: http://en.wikipedia.org/wiki/Dynamic_time_warping

answered Apr 13 '11 at 16:05

carlosdc

12,022
4
45
62

How to compare spoken audio against reference recording - language learning

2 Answers2