Questions tagged [automatic-speech-recognition]

Automatic Speech Recognition (ASR) is using computers to recognize human speech in audio.

9 questions
2
votes
2 answers

How to get all hugging face models list using python?

Is there any way to get list of models available on Hugging Face? E.g. for Automatic Speech Recognition (ASR).
2
votes
1 answer

How to segment and transcribe an audio from a video into timestamped segments?

I want to segment a video transcript into chapters based on the content of each line of speech. The transcript would be used to generate a series of start and end timestamps for each chapter. This is similar to how YouTube now "auto-chapters"…
0
votes
0 answers

react-speech-recognition package not working

It's a simple react package that convert user audio to text. I install the package and try its basic code example but it shows a error "RecognitionManager.js:247 Uncaught ReferenceError: regeneratorRuntime is not defined". import React from…
0
votes
1 answer

Why is Word Information Lost (WIL) calculated the way it is?

Word Information Lost (WIL) is a measure of the performance of an automated speech recognition (ASR) service (e.g. AWS Transcribe, Google Speech-to-Text, etc.) against a gold standard (usually human-generated) transcript, and is generally considered…
0
votes
0 answers

Speaker Diarization is disabled even for supported languages in Google Speech-to-Text API V2

I'm trying to use Google's Speech-to-Text v2 API for transcription and speaker diarization. Per this supported languages page, I should be able to create a Recognizer using the "long" model for the language "en-US" that supports diarization. And yet…
0
votes
1 answer

How does placing the output (word) labels on the initial transitions of the words in an FST lead to effective composition?

I am going through hbka.pdf (WFST paper). https://cs.nyu.edu/~mohri/pub/hbka.pdf A WFST figure for reference Here the input label i, the output label o, and weight w of a transition are marked on the corresponding directed arc by i: o/w. It does not…
0
votes
0 answers

How do you use TensorFlowASR deepspeech2?

Here is the documentation: See python examples/deepspeech2/train_*.py --help Here is the help: python examples/deepspeech2/test.py --help 2023-05-29 10:14:55.122077: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is…
0
votes
0 answers

Fine tunned Whisper-medium always predict "" for all samples

i'm trying to fine tunning whisper-medium for Koreans language. Here is tutorial that i followed. And here is my experiment…
-1
votes
0 answers

Will NVIDIA NeMo Supports Android Platform?

I want to explore NeMo ASR , but I didn't find any documentation related to Android in that . Will NVIDIA NeMo Supports Android Platform ? Any working android samples using NeMo available ? Or Is there any proper steps/APIs available to integrate…