The translation of spoken words into text. Possible synonyms include automatic speech recognition, ASR, computer speech recognition, speech to text, STT.
Questions tagged [speech-to-text]
2372 questions
0
votes
2 answers
does converting from mulaw to linear impact audio quality?
I want to change audio encoding from mulaw to linear in order to use a linear speech recognition model from Google.
I'm using a telephony channel, so audio is encoded in mulaw, 8bits, 8000Hz.
When I use Google Mulaw model, there are some issue with…

ylvi-bux
- 37
- 6
0
votes
2 answers
Google Speech to Text: InvalidArgument: 400 Must use single channel (mono) audio, but WAV header indicates 1 channels
I am using the Google Cloud Platform to convert some audio into text files through the Google Speech-to-Text API. I keep getting the error: google.api_core.exceptions.InvalidArgument: 400 Must use single channel (mono) audio, but WAV header…

Mark
- 1
0
votes
0 answers
Android Java, Speech to text and text to speech - What is a good way to stop the app from listening to itself?
I made an app in Andoid Studio with Java, that will listen to speech, recognize it, turn speech to text, and when I allow it to it will respond back with text to speech.
In the beginning I got a lot of self-repeating, because it listened to itself,…

Berit Larsen
- 739
- 1
- 12
- 29
0
votes
1 answer
Use the google Cloud Speech-to-Text .proto File with gRPC
I am trying to run the (https://github.com/googleapis/nodejs-speech/blob/main/protos/google/cloud/speech/v1p1beta1/cloud_speech.proto) .proto File from Google in NestJS and I get the error:
Is someone here facing the same problem?
Ps. I have…
0
votes
0 answers
Running speech recognizer permanently
I'm trying to figure out a way to use Speech Recognizer permanently. The goal is that my app will listen to what I am constantly saying and handles when I say "OK Phone".
I have tried this with Speech Recognizer in a service. So there was a loop,…

xRay
- 543
- 1
- 5
- 29
0
votes
1 answer
Detect voices, roles and probably even prosody and dysfluency in speech from an audio file
Google/YouTube automatic speech recognition generates subtitles without marking up their voices.
When you have a lecture there is one voice, but when people are having a conversation, or more than one person is serving as talking head the STT…

Joe Weinberg
- 19
- 3
0
votes
1 answer
Subprocess call error while calling generate_lm.py of DeepSpeech
I am trying to build customised scorer (language model) for speech-to-text using DeepSpeech in colab. While calling generate_lm.py getting this error:
main()
File "generate_lm.py", line 201, in main
build_lm(args, data_lower, vocab_str)
…

Anjaly Vijayan
- 237
- 2
- 9
0
votes
1 answer
SpeechRecognizer will be repeated, instead of running permanently
I'm trying to develop something similar to Google Assistant. So when I say "OK app", it should handle. So I have just created a service that is running in the background:
public class SttService extends Service implements RecognitionListener {
…

xRay
- 543
- 1
- 5
- 29
0
votes
0 answers
How to use Vosk models from WebSocket online server?
I have been developing an android app that uses the speech recognition service but the android device has no Google app installed. For that reason, I'm using the vosk API for speech recognition but for better accuracy in speech recognition. I need…

Dev Mishra
- 5
- 3
0
votes
1 answer
what is the purpose of EXTRA_CALLING_PACKAGE in android studio
im now writing STT in android studio and i have a question for some code lines.
intent=new…

Hyunjin Cho
- 28
- 2
0
votes
1 answer
Python SpeechRecognition doesn't listen to full audio?
I'm just trying to simply get a transcript from an audio file using python SpeechRecognition. It seems like no matter what pause_threshold I set, or duration or whatever, it always gives me the same exact output, approximately 30 seconds out of 80…

eeveepotato
- 13
- 3
0
votes
0 answers
Why Live Speech to Text with Watson Speech to Text return Handshake status 403 Forbidden?
I read this tutorial Live speech to text with watson and it return Handshake status 403 Forbidden
I really followed the tutorial to the letter,
I dont know what is missing, can someone help me?
I use windows
transcribe.py : transcribe.py
REGION_MAP…

Sarindra Thérèse
- 180
- 1
- 1
- 12
0
votes
0 answers
hot analyze speech sample sent to google STT
I am looking for recording option in Google STT API. I want to listen to voice received by Google Speech API. What is the way to access stream received by API?
I also tried Wireshark on the server which is using streamingRecognize , but I do not…

Vladimir B
- 170
- 1
- 13
0
votes
2 answers
How to increase the accuracy of the Speech to Text converter
I am using speech-to-text conversion in my application, based on Android API's. It is working pretty well ... but currently it is utilizing US accent as a basis. This results in the application sometimes matching words entirely different from what I…

star angel
- 520
- 2
- 7
- 14
0
votes
1 answer
speech recognizer set text in edit text mishap (Android studio)
I am using the speech to text concept using speech recognizer, but when the text is set to the edit text, the existing text in the edit text is erased and the text converted from the speech is set in its place.
But, I want the converted text to…

Harshitha
- 29
- 5