0

Ideally what I am looking for is a way to get a vector of probability that a particular segment of an audio file is a certain phone. Something like:

input:

  • wavfile
  • start position (e.g. @1.4 sec)
  • duration (e.g. 500 ms)

output:

  • SIL 2.324*10^-3
  • AA 1.514*10^-4
  • AE 1.482*10^-2
  • ...
  • ZH 5.03*10^-5
kkawabat
  • 1,530
  • 1
  • 14
  • 37
  • May I ask you what exactly are you trying to do? It is quite unusual to estimate a single monophone probability for such a long segment. Also, the probabilities will be represented in log scale, as their values will be very tiny and cause underflow. – Dmytro Prylipko Jan 16 '19 at 12:27
  • Are you aiming just on acoustic score or combined with a language model? – Dmytro Prylipko Jan 16 '19 at 12:28
  • @Dmytro Prylipko Thank you for the reply, the numbers used to illustrate the use case were bad examples. I would be running this on much shorter segments and I understand it would be in log scale. I am only looking for the acoustic scores, it would be independent of the LM, I am trying to use these scores to generate a metric for phoneme level pronunciation accuracy. – kkawabat Jan 16 '19 at 18:34

1 Answers1

1

You can obtain the scores running HVite in forced alignment mode. I am afraid you have to run this for every phoneme you have:

HVite -A -D -T 1 -l '*' -o NTW -C HTK.cfg -a \
    -H macros \
    -H hmmdefs \
    -i acoustic_score_AA.mlf \
    -y lab \
    -I AA.mlf \
    -S index.scp \
    words phones

The output file acoustic_score_AA.mlf will contain the result. I

The contents of words vocabulary file should be like:

AA AA
AE AE
....
ZH ZH

and the phones has to contain the list of the phonemes (HMM models), as far as I remember.

The trick here is the content of the input .mlf file. For instance, AA.mlf should be like:

#!MLF!#
"*/S0001.lab"
AA
.

This will force HVite to apply the AA model for the whole utterance. Chunking of the audio file has to be performed in advance.

Dmytro Prylipko
  • 4,762
  • 2
  • 25
  • 44