2

I'm trying to use Webkit Speech Recognition API to recognize single syllables, rather than full words or sentences. As this API requires "grammar" definition, I wonder if there is a way to implement single syllable recognition. Something like "ah" or "bi".

Thanks

Forepick
  • 919
  • 2
  • 11
  • 31
  • There is a way that you can set `grammars` in `SpeechRecognition` https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/grammars but you need syllables... And syllables are a lot... Perhaps you could create you own grammar using this https://github.com/tur-nr/node-jspeech. –  Sep 26 '20 at 22:32

1 Answers1

1

Unfortunately, this isn't possible with the Web Speech API. Although you can create custom grammars (which are collections of words), you can't define custom dictionaries or vocabularies (which are the words themselves). In your case, you'll need to define a custom vocabulary that includes individual phonemes as the words, and then limit your grammar to only choose words from your custom vocabulary. There are a few paid cloud-based services that will allow you to do this.

For example, using IBM Watson, you could create a custom language model and then add words to the model (in your case, each phoneme would be a "word"). Normally, a custom language model is blended with a general language model, but you wouldn't want that, so you would set the customization weight to 1.0 (meaning it would only use your custom language model).

There are other ways you could go about it too, but I doubt you'd find a purely web-based solution that doesn't involve a paid service. If you're able to move to a native platform (or create your own web-based service on the server), then you have a few more options. For example, CMUSphinx would allow you to create a custom dictionary to use with Sphinx4 on the server or PocketSphinx on mobile. Although CMUSphinx isn't the most accurate system for large-vocabulary applications, your custom vocabulary would be tiny, so CMUSphinx would perform very well.

David Jones
  • 10,117
  • 28
  • 91
  • 139
  • Thanks David for the detailed answer. So as I understand, I'll have to create my own web-service (probably hosted on a cloud somewhere) and send a short recording of the spoken syllable by the web browser. Is that correct? – Forepick Oct 01 '20 at 08:56
  • 1
    @Forepick If you need to be on web (not native mobile or desktop), then yes, you'll need to do the speech recognition on the server (whether it's your own or a cloud service). I would set up Sphinx4 on a server and then use WebSockets to stream the audio data to the server in real time. Then you can send the recognition response back over the same socket. – David Jones Oct 01 '20 at 15:09