0

I want to create an app which records what you say to the microphone and extract all the words.

I know this is a problem many companies and individuals are working on, but I am not quite sure how far we are from developing tools that are good at this.

Also, are there any publicly available tools to achieve this? I would hope there is an API provided by Google Assistant, Apple Siri or something that I can just use by uploading an audio clip and then acquiring the words said.

Jamgreen
  • 10,329
  • 29
  • 113
  • 224
  • Have you stumbled upon [**pocketsphinx.js**](https://github.com/syl22-00/pocketsphinx.js)? Might be worth a try. – Tholle May 31 '17 at 09:19
  • Not sure how useful it would be to you, but I've had pretty good results with the Amazon Echo and its developer tools. Unfortunately you need an actual device to try out the voice recognition yourself; the dev kit only allows you to type things in that will get passed to your 'skill'. – Herr Pink May 31 '17 at 09:22

2 Answers2

3

Although Google does have a Google Assistant SDK, it is primarily aimed at sending audio from your software or device and receiving an audio response from the Assistant - just like you would get on a Google Home. Similarly,Actions on Google are meant to handle all the Natural Language Processing (NLP) and give you a response - not to give you exactly what is said (although that is a side-effect).

It sounds more like you want the Cloud Speech API which is a speech to text (STT) system. You may want to combine this with something like the Cloud Natural Language API which can then parse meaning from the text produced.

Prisoner
  • 49,922
  • 7
  • 53
  • 105
2

Microsoft have Bing Speech API That is used to process audio and extract the words spoken.

They also have Custom Speech Service and Speaker Recognition API

Custom Speech Service is used to overcome speech recognition barriers such as speaking style, vocabulary and background noise.

The help docs and samples available are a great place to start.