Phoneme Recognition with PocketSphinx

Question

I need the real-time phoneme recognition from the microphone on Windows 8 Desktop. So I followed http://cmusphinx.sourceforge.net/wiki/phonemerecognition and built pocketsphinx_continuous from the subversion source in VS2013. Running it in the command line as Administrator:

D:\_SPHINX\cmusphinx-code-13103-trunk\pocketsphinx\bin\Release\Win32>pocketsphinx_continuous.exe -infile ../../../test/data/goforward.raw -hmm ../../../model/en-us/en-us -allphone ../../../model/en-us/en-us-phone.lm.bin -backtrace yes -beam 1e-20 -pbeam 1e-20 -lw 2.0
INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from ../../../model/en-us/en-us/feat.params
Current configuration:
[NAME]                  [DEFLT]         [VALUE]
-agc                    none            none
-agcthresh              2.0             2.000000e+000
-allphone                               ../../../model/en-us/en-us-phone.lm.bin
-allphone_ci            no              no
-alpha                  0.97            9.700000e-001
-ascale                 20.0            2.000000e+001
-aw                     1               1
-backtrace              no              yes
-beam                   1e-48           1.000000e-020
-bestpath               yes             yes
-bestpathlw             9.5             9.500000e+000
-ceplen                 13              13
-cmn                    current         current
-cmninit                8.0             40,3,-1
-compallsen             no              no
-debug                                  0
-dict
-dictcase               no              no
-dither                 no              no
-doublebw               no              no
-ds                     1               1
-fdict                                  ../../../model/en-us/en-us/noisedict
-feat                   1s_c_d_dd       1s_c_d_dd
-featparams                             ../../../model/en-us/en-us/feat.params
-fillprob               1e-8            1.000000e-008
-frate                  100             100
-fsg
-fsgusealtpron          yes             yes
-fsgusefiller           yes             yes
-fwdflat                yes             yes
-fwdflatbeam            1e-64           1.000000e-064
-fwdflatefwid           4               4
-fwdflatlw              8.5             8.500000e+000
-fwdflatsfwin           25              25
-fwdflatwbeam           7e-29           7.000000e-029
-fwdtree                yes             yes
-hmm                                    ../../../model/en-us/en-us
-input_endian           little          little
-jsgf
-keyphrase
-kws
-kws_delay              10              10
-kws_plp                1e-1            1.000000e-001
-kws_threshold          1               1.000000e+000
-latsize                5000            5000
-lda
-ldadim                 0               0
-lifter                 0               22
-lm
-lmctl
-lmname
-logbase                1.0001          1.000100e+000
-logfn
-logspec                no              no
-lowerf                 133.33334       1.300000e+002
-lpbeam                 1e-40           1.000000e-040
-lponlybeam             7e-29           7.000000e-029
-lw                     6.5             2.000000e+000
-maxhmmpf               30000           30000
-maxwpf                 -1              -1
-mdef                                   ../../../model/en-us/en-us/mdef
-mean                                   ../../../model/en-us/en-us/means
-mfclogdir
-min_endfr              0               0
-mixw
-mixwfloor              0.0000001       1.000000e-007
-mllr
-mmap                   yes             yes
-ncep                   13              13
-nfft                   512             512
-nfilt                  40              25
-nwpen                  1.0             1.000000e+000
-pbeam                  1e-48           1.000000e-020
-pip                    1.0             1.000000e+000
-pl_beam                1e-10           1.000000e-010
-pl_pbeam               1e-10           1.000000e-010
-pl_pip                 1.0             1.000000e+000
-pl_weight              3.0             3.000000e+000
-pl_window              5               5
-rawlogdir
-remove_dc              no              no
-remove_noise           yes             yes
-remove_silence         yes             yes
-round_filters          yes             yes
-samprate               16000           1.600000e+004
-seed                   -1              -1
-sendump                                ../../../model/en-us/en-us/sendump
-senlogdir
-senmgau
-silprob                0.005           5.000000e-003
-smoothspec             no              no
-svspec                                 0-12/13-25/26-38
-tmat                                   ../../../model/en-us/en-us/transition_matrices
-tmatfloor              0.0001          1.000000e-004
-topn                   4               4
-topn_beam              0               0
-toprule
-transform              legacy          dct
-unit_area              yes             yes
-upperf                 6855.4976       6.800000e+003
-uw                     1.0             1.000000e+000
-vad_postspeech         50              50
-vad_prespeech          20              20
-vad_startspeech        10              10
-vad_threshold          2.0             2.000000e+000
-var                                    ../../../model/en-us/en-us/variances
-varfloor               0.0001          1.000000e-004
-varnorm                no              no
-verbose                no              no
-warp_params
-warp_type              inverse_linear  inverse_linear
-wbeam                  7e-29           7.000000e-029
-wip                    0.65            6.500000e-001
-wlen                   0.025625        2.562500e-002

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: ../../../model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: ../../../model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: ../../../model/en-us/en-us/transition_matrices

at the last INFO line Windows 8 throws this error:

Is anything wrong with PocketSphinx debug output, or my command-line options? Or it is a pure Windows problem? I noticed this folder: /bin/Release/Win32. My Windows 8 is 64bit on Intel NUC. Sphinxbase.dll was compiled from subversion in Debug mode, while PacketSphinx had only Release mode.

Also I read somewhere that phonemes timing information is available - how to get it?

ADDITION: following Nikolay's advice, with these parameters, I eliminated errors, but got no phonemes:

D:\_SPHINX\pocketsphinx\bin\Debug>pocketsphinx_continuous.exe -infile ../../test/data/goforward.raw -hmm ../../model/en-us/en-us -allphone ../../model/en-us/en-us.lm.dmp -backtrace yes -beam 1e-20 -pbeam 1e-20 -lw 2.0 -debug 3 -verbose yes
INFO: cmd_ln.c(697): Parsing command line:
pocketsphinx_continuous.exe \
        -infile ../../test/data/goforward.raw \
        -hmm ../../model/en-us/en-us \
        -allphone ../../model/en-us/en-us.lm.dmp \
        -backtrace yes \
        -beam 1e-20 \
        -pbeam 1e-20 \
        -lw 2.0 \
        -debug 3 \
        -verbose yes
. . . .
INFO: acmod.c(252): Parsed model-specific feature parameters from ../../model/en-us/en-us/feat.params
INFO: fe_interface.c(177): Current FE Parameters:
INFO: fe_interface.c(178):      Sampling Rate:             16000.000000
INFO: fe_interface.c(179):      Frame Size:                410
INFO: fe_interface.c(180):      Frame Shift:               160
INFO: fe_interface.c(181):      FFT Size:                  512
INFO: fe_interface.c(183):      Lower Frequency:           130
INFO: fe_interface.c(185):      Upper Frequency:           6800
INFO: fe_interface.c(186):      Number of filters:         25
INFO: fe_interface.c(187):      Number of Overflow Samps:  0
INFO: fe_interface.c(188):      Start Utt Status:          0
INFO: fe_interface.c(190): Will not remove DC offset at frame level
INFO: fe_interface.c(196): Will not add dither to audio
INFO: fe_interface.c(200): Will apply sine-curve liftering, period 22
INFO: fe_interface.c(203): Will normalize filters to unit area
INFO: fe_interface.c(205): Will round filter frequencies to DFT points
INFO: fe_interface.c(207): Will not use double bandwidth in mel filter
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: ../../model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: ../../model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: ../../model/en-us/en-us/transition_matrices
INFO: acmod.c(124): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: ../../model/en-us/en-us/means
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: ../../model/en-us/en-us/variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file ../../model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(835): Maximum top-N: 4
INFO: phone_loop_search.c(115): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4101 * 20 bytes (80 KiB) for word entries
INFO: dict.c(342): Reading filler dictionary: ../../model/en-us/en-us/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(345): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=19794, 2=1377200, 3=3178194
INFO: ngram_model_dmp.c(242):    19794 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288):  1377200 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314):  3178194 = LM.trigrams read
INFO: ngram_model_dmp.c(339):    57155 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359):    10935 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379):    34843 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407):     2690 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463):    19794 = ascii word strings read
INFO: allphone_search.c(239): Building PHMM net of 137095 phones
INFO: allphone_search.c(312): 29324 nodes, 1958591 links
INFO: allphone_search.c(611): Allphone(beam: -450, pbeam: -450)
INFO: continuous.c(299): pocketsphinx_continuous.exe COMPILED ON: Aug 23 2015, AT: 14:00:33

INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00  3.00 -1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to   < 44.50 -4.13  0.15  6.94  4.06 -5.38 -2.56 -3.13 -6.12 -1.20 -7.44 -2.25  0.48 >
INFO: allphone_search.c(852): 214 frames, 214 HMMs (1/fr), 642 senones (3/fr), 214 history entries (1/fr)
INFO: allphone_search.c(865): allphone 0.61 CPU 0.283 xRT
INFO: allphone_search.c(867): allphone 0.62 wall 0.290 xRT
INFO: allphone_search.c(911): Hyp: SIL
INFO: pocketsphinx.c(1133): SIL (-858993460)
word                 start end   pprob ascr       lscr       lback
SIL                  51    264   1.000 -1627      0          0
INFO: allphone_search.c(911): Hyp: SIL
SIL
INFO: cmn_prior.c(131): cmn_prior_update: from < 44.50 -4.13  0.15  6.94  4.06 -5.38 -2.56 -3.13 -6.12 -1.20 -7.44 -2.25  0.48 >
INFO: cmn_prior.c(149): cmn_prior_update: to   < 44.50 -4.13  0.15  6.94  4.06 -5.38 -2.56 -3.13 -6.12 -1.20 -7.44 -2.25  0.48 >
INFO: allphone_search.c(852): 0 frames, 0 HMMs (0/fr), 0 senones (0/fr), 0 history entries (0/fr)
INFO: allphone_search.c(651): TOTAL fwdflat 0.61 CPU 0.285 xRT
INFO: allphone_search.c(654): TOTAL fwdflat 0.64 wall 0.298 xRT

What is the correct set of command-line parameters, to get phonemes output?

You need to compile pocketsphinx properly with the instruction provided. In your compilation it seems you confused runtime options in the project you incorrectly modified. If you can't compile yourself, download precompiled version. — Nikolay Shmyrev, Aug 23 '15 at 06:36
when I try precompiled version on my Win8, it said "The program can't start because MSVCR110D.dll is missing". It means I had to install VC2012 redistributable.. installed it and it still didn't help. So I had to recompile myself in VS2013: sphinxbase first, copied sphinxbase.dll and .exp and .lib from /sphinxbase/bin/Debug to /pocketsphinx/bin/Debug, than pocketsphinx second. Apparently it started working, complaining that "../../model/en-us/en-us-phone.lm.bin is not a dump file", but providing some phonemes list: "SIL UH OW F AO R W ER D JH T HH AE N IY IH NG DH IY IH ZH ER Z S V SIL". — K-man, Aug 24 '15 at 05:46
With my best guess on -allphone parameter as "../../model/en-us/en-us.lm.dmp", errors gone, and phonemes as well - please see output above, in edited question..! What's the right command-line options, are these Ok: pocketsphinx_continuous.exe -infile ../../test/data/goforward.raw -hmm ../../model/en-us/en-us -allphone ../../model/en-us/en-us.lm.dmp -backtrace yes -beam 1e-20 -pbeam 1e-20 -lw 2.0 -debug 3 -verbose yes — K-man, Aug 24 '15 at 05:46
SIL alone is a silence.. but we know that goforward.raw file contains phrase "go forward ten years", I can recognize it by pocketsphinx. But it doesn't provide corresponding phonemes with -allphone assigned to en-us.lm.dmp. If I set it to the non-existent file, it gives phonemes! so what is the valid argument for -allphone ? — K-man, Aug 24 '15 at 18:41
Do we need this language model for a phonemes list: WARN: "allphone_search.c", line 582: Failed to load language model specified in -allphone, doing unconstrained phone-loop decoding — K-man, Aug 24 '15 at 18:43

Phoneme Recognition with PocketSphinx

0 Answers0