Questions tagged [speech-to-text]

The translation of spoken words into text. Possible synonyms include automatic speech recognition, ASR, computer speech recognition, speech to text, STT.

2372 questions
5
votes
3 answers

Transcribing mp3 to text (python) --> "RIFF id" error

I am trying to turn mp3 file to text, but my code returns the error outlined below. Any help is appreciated! This is a sample mp3 file. And below is what I have tried: import speech_recognition as sr print(sr.__version__) r =…
5
votes
4 answers

Expo Voice Recognition

I've been trying to implement Voice recognition on my Expo app, I've tried using a speech-to-text library called react-native-voice but it does not support Expo. Does anyone know any other library that I can use. I have read some articles on using…
5
votes
1 answer

Subtitles/captions with Microsoft Azure Speech-to-text in Python

I've been trying to figure out how to make subtitles with Microsoft Azure Speech Recognition service in Python, but can't figure it out. I've followed the tips someone else has answered here on getting the individual words, but even formatting those…
5
votes
0 answers

Alternative to Deprecated WebSpeech API Function

I need to work with audio, and I need to use the following line of code after creating an AudioContext: audioContext.createScriptProcessor(4096, 1, 1); Although this still works, I see that this is deprecated and replaced: Deprecated This feature…
5
votes
2 answers

API or SDK to make speech recognition only for numbers (between 1 and 10000)?

I need a specialized solution optimized to detect numbers between 1 and 1000 to be used on a smartphone. Best solution would be to have this SDK working offline. Any idea ? I do not find any configuration with Google Speech or Amazon Transcribe to…
fvisticot
  • 7,936
  • 14
  • 49
  • 79
5
votes
2 answers

Transcribe an Audio File in Python

I'm trying to transcribe an audio file which is bit large. It's properties are as follows. Size : 278.3 MB Duration : 52 minutes Format : WAV Follwoing is my code which I used to convert it having 60 second durations. Could you please advice to…
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121
5
votes
1 answer

Android 2.2: Where is the option for speech input in the emulator?

My Nexus One has it: Settings includes a "Voice recognizer settings" in the list of "Voice input & output settings". Google Search has a microphone button next to it, so when I touch it, a dialog prompts me to say what I want to search. On the…
5
votes
1 answer

Google Cloud Platform: Speech to Text Conversion of Large Media Files

I'm trying to extract text from mp4 media file downloaded from youtube. As I'm using google cloud platform so thought to give a try to google cloud speech. After all the installations and configurations, I copied the following code snippet to get…
5
votes
1 answer

How to improve the accuracy for speech-to-text conversion using recognize_sphinx API in Python

How can we improve the accuracy of speech to text conversion using recognize_sphinx API in Python? Please find the below code, which needs to improve the accuracy base! import speech_recognition as sr # Obtain path to "english.wav" in the same…
5
votes
1 answer

iOS Speech-to-text AVAudioInputNode(?) random crash

I have a speech-to-text function in my app, press & hold the button; a viewcontroller is animated from outside windowbounds into view and recording starts, release the button; recording stops and view is animated out of windowbounds. Suddenly I'm…
5
votes
1 answer

Android Bluemix not showing speaker tag

I am using IBM bluemix to transcribe some audio, and I want to use the APIs speaker recognition. I set up the the recognizer like this: private RecognizeOptions getRecognizeOptions() { return new RecognizeOptions.Builder() …
bear
  • 663
  • 1
  • 14
  • 33
5
votes
1 answer

How to provide hint to iOS speech recognition API?

I want to create an app that receive voice input using iOS speech API. In google's API, there is an option for speechContext which I can provide hint or bias to some uncommon words. Do iOS API provide this feature? I've been searching the site for…
Thanon K
  • 177
  • 8
5
votes
0 answers

Why Google Cloud Speech API doesn't transcript the whole audio file?

I'm trying to transcript a short interview audio file with Google Cloud Speech API (asynchronously) but it only transcribes the first half minute of the recording. I had several attempts with recordings longer than one minute and the results were…
5
votes
1 answer

Using Bing Speech example with Curl

First off, I don't really know what I'm doing, so I apologize for the stupid question... just trying to follow the instructions here: https://www.microsoft.com/cognitive-services/en-us/Speech-api/documentation/GetStarted/GetStarted-cURL using cURL…
AnarchyJim
  • 51
  • 1
5
votes
1 answer

How to train an lstm for speech recognition

I'm trying to train lstm model for speech recognition but don't know what training data and target data to use. I'm using the LibriSpeech dataset and it contains both audio files and their transcripts. At this point, I know the target data will be…
JorgeC
  • 615
  • 3
  • 7
  • 13