0

Using GStreamer plugin from Alumae and the following pipeline :

appsrc source='appsrc' ! wavparse ! audioconvert ! audioresample ! queue ! kaldinnet2onlinedecoder <parameters snipped> ! filesink location=/tmp/test

I always get the following assert that I don't understand KALDI_ASSERT(current_log_post_.NumRows() == info_.frames_per_chunk / info_.opts.frame_subsampling_factor && current_log_post_.NumCols() == info_.output_dim);

What is this assert error about ? How to fix it ?

FYI, the data pushed into the pipeline come from a streamed wav file and replacing kaldinnetonlinedecoder with wavenc correctly generate a Wav file instead of a text file at the end.

EDIT Here are the parameters used:

use-threaded-decoder=0   
model=/opt/en/final.mdl   
word-syms=<word-file>  
fst=<fst_file>
mfcc-config=<mfcc-file>  
ivector-extraction-config=/opt/en/ivector-extraction/ivector_extractor.conf  
max-active=10000  
beam=10.0  
lattice-beam=6.0  
do-endpointing=1  
endpoint-silence-phones=\"1:2:3:4:5:6:7:8:9:10\"  
traceback-period-in-secs=0.25  
num-nbest=10  

For your information, using the pipeline textual representation in python works but coding it (i.e using Gst.Element_Factory.make and so on) always throw the exception

SECOND UPDATE Here is the full stack trace generated by the assert

ASSERTION_FAILED ([5.2]:AdvanceChunk():decodable-online-looped.cc:223) : 'current_log_post_.NumRows() == info_.frames_per_chunk / info_.opts.frame_subsampling_factor && current_log_post_.NumCols() == info_.output_dim'

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
kaldi::nnet3::DecodableNnetLoopedOnlineBase::AdvanceChunk()
kaldi::nnet3::DecodableNnetLoopedOnlineBase::EnsureFrameIsComputed(int)
kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int)
kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)
kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*, int)
kaldi::SingleUtteranceNnet3Decoder::AdvanceDecoding()
Frédéric Praca
  • 1,620
  • 15
  • 29
  • `parameters snipped` is a bad idea. Overall, if you need an answer you need to provide more details. – Nikolay Shmyrev Nov 24 '17 at 22:52
  • Well I expected that the Kaldi Gst Element would negociate its caps as any other Gst plugin, that's the reason why I didn't provide the parameters. I just wanted to understand what it means. I don't have access to the parameters right now but I will provide it to you asap. – Frédéric Praca Nov 24 '17 at 23:02
  • Sorry for the delay, I just added the parameters in the question – Frédéric Praca Nov 29 '17 at 08:45
  • Ok, and what model is this? For newer models you need to add frame subsampling factor probably. – Nikolay Shmyrev Nov 30 '17 at 15:24
  • A brand new nnet3 model I got from a subcontractor but it's working without any frame subsampling using textual representation. – Frédéric Praca Nov 30 '17 at 15:28
  • So, back in the troubles. In order to make our model work we need the subsampling factor but adding it to the preperties triggers the assert in Kaldi. After reading the code from the kadli gst server (https://github.com/alumae/kaldi-gstreamer-server ), we've seen that data is sent by the client as chunks of samplerate/4. Why this specific value ? – Frédéric Praca Jan 19 '18 at 14:00
  • You need to provide more information about assert. Chunk size samplerate/4 is unrelated, it could be pretty arbitrary. Samplerate/4 is 1/4 of second, which is reasonable value for fast response. – Nikolay Shmyrev Jan 19 '18 at 22:09
  • I updated the main question to provide the whole stack trace. If I talked about the mysterious _samplerate/4_, it is only beacause that's the only difference I can see between the _kaldi-gstreamer-server_ and our code. Moreover, it seems to show that the gst plugin can't work in a simple chain when adding the subsampling factor parameter. – Frédéric Praca Jan 22 '18 at 08:32
  • Changing _samplerate/4_ does not trigger the assert but has an impact on transcription quality. I will go on searching what is done in the worker code of _kaldi-gstreamer-server_ to find what might be the problem. – Frédéric Praca Jan 22 '18 at 09:59

1 Answers1

0

I finally got it working, even with frame-subsampling-factor parameter.

The problem resides in the order of the parameters. fst and model parameters have to be the last ones.

Thus the following textual chain works :

gst-launch-1.0 pulsesrc device=alsa_input.pci-0000_00_05.0.analog-stereo ! queue ! \
           audioconvert ! \
           audioresample ! tee name=t ! queue ! \
       kaldinnet2onlinedecoder \
       use-threaded-decoder=0 \
       nnet-mode=3 \
       word-syms=/opt/models/fr/words.txt \
       mfcc-config=/opt/models/fr/mfcc_hires.conf \
       ivector-extraction-config=/opt/models/fr/ivector-extraction/ivector_extractor.conf \
       phone-syms=/opt/models/fr/phones.txt \
       frame-subsampling-factor=3 \
       max-active=7000 \
       beam=13.0 \
       lattice-beam=8.0 \
       acoustic-scale=1 \
       do-endpointing=1 \
       endpoint-silence-phones=1:2:3:4:5:16:17:18:19:20 \
       traceback-period-in-secs=0.25 \
       num-nbest=2 \
       chunk-length-in-secs=0.25 \
       fst=/opt/models/fr/HCLG.fst \
       model=/opt/models/fr/final.mdl \
       ! filesink async=0 location=/dev/stdout t. ! queue ! autoaudiosink async=0

I opened an issue on GitHub for this as for me, this can be really difficult to find and should at least be documented.

Frédéric Praca
  • 1,620
  • 15
  • 29