2

I am training an acoustic model with CMU sphinx. I have around 200 hours of speech data. When I run the training script (sphinxtrain run, initially 4 of my CPU cores was at 100% and my machine got hot so it was doing some calculations. However, now the script looks looks like it is hang at Module 20, Phase 3. Upon inspection, I've realized that 4 copies of "Perl 5.12" are running in my machine with 0% CPU utilization and updating a file called qmanager/bw.2.4.out in the training directory (Baum Welch model?). This file is constantly updated; I have an SSD drive.

My question is whether this is normal (that CPU usage is at 0%) and is there a way to speed up the training.

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
Mikrasya
  • 964
  • 12
  • 21

1 Answers1

2

My question is whether this is normal (that CPU usage is at 0%) and is there a way to speed up the training.

No there was an error. You can check details in logs in logdir folder. Most likely you incorrectly specified a path to some data file or to the library. Sometimes it's ok to restart.

Make sure an4 tutorial works for you first.

there a way to speed up the training

It must be pretty fast. If you enable training on 8 cores a model on 200 hours should train in 1 day.

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Thanks Nikolay. Two errors I got two of the following error "INFO: cmn.c(175): CMN: 14.39 0.26 -0.38 0.19 -0.53 -0.12 -0.22 -0.44 -0.18 -0.37 0.01 -0.32 -0.12 ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached ERROR: "baum_welch.c", line 324: speaker_6/06_004012 ignored". I was ignoring it assuming it is related to the training day. Any insight? – Mikrasya Mar 03 '14 at 14:31
  • This error is not relevant. You need to check all errors in logdir, not just a single one you noticed. You can provide database folder for analysis. Like I said, you can try to train on smaller database first. – Nikolay Shmyrev Mar 03 '14 at 16:00
  • I've tried with a much smaller dataset (my own and not an4) and still have the same issue it keeps writing the file that I've mentioned my original post. In the .html file, there is no error; also I've grep'd the logdir directory and there is no error there as well. – Mikrasya Mar 06 '14 at 15:37
  • You need to read logs in logdir, not grep them. You can share them to get more meaningful advise. To get help effectively you can share your small database as a whole. You can share your attempt to build an4 database too. – Nikolay Shmyrev Mar 07 '14 at 09:01
  • Thanks Nikolay. My script was deleting "feat.params" which caused the problem. I still have other issues but at least it is not hang now.. – Mikrasya Mar 08 '14 at 21:49
  • Ok, great, let me kno whow it goes – Nikolay Shmyrev Mar 09 '14 at 05:56