0

Now I have some trouble with acoustic model building I use a ubuntu 14.04 on virtual box to test run pocketsphinx and train my acoustic model using sphinxtrain. Do I need to convert my .wav files to .mfc first and then run "sphinxtrain run" command? I did following steps,

  1. Run "sphinxtrain run" command
  2. Run sphinx_fe -i Anuradha-eight.wav -o file.mfc -argfile etc/feat.params in order to convert .wav to .mfc

Both tries failed. Output and Log files can be seen HERE

dab1984
  • 47
  • 6

1 Answers1

0

Do I need to convert my .wav files to .mfc first and then run "sphinxtrain run" command? I

No

Output and Log files can be seen HERE

 Failed to open /home/anuradha/Desktop/Workspace/sinhala/wav/Chathuri/chathuri-amma.wav: No such file or directory

Log says that the file is missing in the path it should be placed at.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • But the file is their. the wav file is exist in the said location and I'm also getting
    **WARNING: Error in '/home/anuradha/Desktop/Workspace/sinhala/etc/sinhala_train.fileids', the feature file '/home/anuradha/Desktop/Workspace/sinhala/feat/Anuradha/Anuradha_eight.mfc' does not exist, or is empty**
    and my ETC and WAV folder contents can be seen [HERE] [1] [1]: http://s000.tinyupload.com/?file_id=73024785348361020085
    – dab1984 Jul 09 '15 at 03:37
  • Your files have wma extention. Files must be transcoded in WAV, the format must be 16khz 16bit mono. – Nikolay Shmyrev Jul 09 '15 at 08:32
  • And obviously files must have .wav extenion, not .wma – Nikolay Shmyrev Jul 09 '15 at 10:43
  • I have corrected the wav file mistake and ran the sphinxtrain but now I'm getting WARN: "gauden.c", line 1349: Scaling factor too small: -885637.551544 WARN: "gauden.c", line 1349: Scaling factor too small: -5339149.160396 WARN: "gauden.c", line 1349: Scaling factor too small: -1041694.181730 ERROR: "backward.c", line 1011: alpha(2.500000e-01) <> sum of alphas * betas (0.000000e+00) in frame 113 ERROR: "baum_welch.c", line 324: anuradha/anuradha-eight ignored My file stracture can be seen here http://s000.tinyupload.com/?file_id=30374866907752211810 – dab1984 Jul 10 '15 at 03:33
  • It is ok, you need more training data – Nikolay Shmyrev Jul 10 '15 at 05:24
  • what will be the minimum amount training data .For example: 200 word with 200 recordings? since I'm doing this for sinhala language before go into large scale of data, I want to test with minimum. – dab1984 Jul 10 '15 at 12:25
  • It is covered in tutorial in the beginning, you should just read it through – Nikolay Shmyrev Jul 10 '15 at 20:14