Questions tagged [kaldi]

Kaldi speech recognition toolkit

Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

113 questions
2
votes
0 answers

Unable to stream live audio from mic to remote port in PyAudio

I have a transcription server listening for audio on a port on a remote machine. Everything works If I stream a pre-recorded audio file and stream it to the port using netcat I'm not able to do same using mic as input. I'm trying the following but…
Jaskaran Singh Puri
  • 729
  • 2
  • 11
  • 37
2
votes
1 answer

KALDI after training

I've been doing some KALDI learning these days and I follow the tutorial and I complete some examples like yesno, voxforge, ynstadial, and a custom digits ASR. But after all of the above completed I only got something like WER 5% and some log. How…
Eric WU
  • 21
  • 3
2
votes
0 answers

PyAudio callback not running during blocking operation

I'm writing an audio processing script which listens for audio and runs speech recognition on it. I'm using a PyAudio callback function to capture audio frames and trigger recording/stop when the audio level is above a certain threshold. The problem…
2
votes
1 answer

Kaldi: Output of qsub was: qsub: illegal -c value "" when trying to run the Common Voice recipe

I am trying to run Kaldi's Common Voice recipe (kaldi/egs/commonvoice/s5/run.sh) on my computer (i.e., not on a cluster). It crashes with the error message Output of qsub was: qsub: illegal -c value "". What could be the issue? Specifically, here…
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
1
vote
0 answers

"phones in the dictionary that do not have acoustic models" montreal forced aligner

I try to follow the example in the documentation of MFA : I execute on my computer (windows 10, Python 3.9, pip 21.2.4): pip install montreal-forced-aligner mfa download acoustic english Then, when I execute: mfa align path/to/dataset…
Yanirmr
  • 923
  • 8
  • 25
1
vote
1 answer

Cannot run .c because of segmentation fault using vosk

I'm on ubuntu 18.04 and i'm trying to run a .c file that came with an API called vosk that i just want to run . The issue is that the python script (which comes standard with the API) works without any problems but after compiling with make the .c…
Birto
  • 71
  • 5
1
vote
0 answers

fstpushspecial: error while loading shared libraries: fstpushspecial: unsupported version 0 of Verneed record

I am trying to run a kaldi recipe and am getting the following above error on my terminal. I have referred to the solutions posted for using shared libraries. But the issue still remains unresolved. Please look into the below attached screenshots…
1
vote
0 answers

Failed to install pykaldi on ubuntu 18.04

I followed the instructions and ran the following to commands to install pykaldi: git clone https://github.com/pykaldi/pykaldi.git cd pykaldi sudo apt-get install autoconf automake cmake curl g++ git graphviz libatlas3-base libtool make pkg-config…
Soroush
  • 83
  • 8
1
vote
1 answer

Creating a project specific Vosk dictionary

I am working on an application which uses Vosk for speech recognition. I would like to create a dictionary for the application which contains only the trigger words and spoken numbers needed by the application. Using command line instructions found…
portsample
  • 1,986
  • 4
  • 19
  • 35
1
vote
1 answer

How to resolve this Kaldi ASR MFCC feature Extraction

I am facing some issue related to Kaldi Feature extraction. I am new to Kaldi, please help me out. OS: Ubuntu 18.04 I am currently trying to extract MFCC features and get VAD from the speech,when I am running the file mfcc.sh #!/bin/bash #cd…
1
vote
1 answer

Need to reload vosk model for every transcription?

The vosk model that I'm using is the vosk-model-en-us-aspire-0.2 (1.4GB). Every time needs quite an amount of time to load the vosk model. Is it necessary to recreate the vosk object every time? It takes much time to load the model if we only load…
1
vote
0 answers

Vocal command recognition with Kaldi-ASR?

My daughter and I are building a robot horse. One design goal is to use speech-recognition to recognize commands given to the horse and respond accordingly. Since most of the commands are barely English words, I need something that I can create…
SvdSinner
  • 951
  • 1
  • 11
  • 23
1
vote
0 answers

Neural net expects 'ivector' features with dimension 100 but you provided 0

I am using gooofy zamia-speech for kaldi's model adaptation for a project. I followed the steps given by kaldi-adapt-lm to create the model using kaldi-generic-de-tdnn_f-r20190328 model. When I tested it on a .wav file it showed the following…
1
vote
0 answers

Why is the ngram-merge of srilm taking wrong input?

This is my first post here and sorry for my poor english. I'm instantly working on Kaldi and srilm tools for my research, but I faced a strange problem while using ngram-merge to merge the 3-gram.count files generated by ngram-count. (ngram-count…
趙哲宏
  • 11
  • 1
1
vote
0 answers

How to detect filler sound like um, uh, etc using cmusphinx/mozilla deepspeech/google stt etc?

I am working on a project in Speech Recognition and the task is to detect filler sounds like um, uh, eh, etc. on audio clips of children/students speaking in English. Their speaking English is not that great. How can this be done using…
Sumit Jangra
  • 127
  • 1
  • 15