3

Greetins,

I am currently trying to implement a speech recognition functionality on my application. According to the JS documentation here, speech to text is supported since Safari 14.1. Also, I am using the following configurations:

    const { webkitSpeechRecognition } = (window as any)
    const recognition = new webkitSpeechRecognition();
    recognition.lang = 'pt-BR';
    recognition.continuous = true;
    recognition.interimResults = false;
    recognition.maxAlternatives = 1;
    // Avoid garbage collection bugs
    this.garbage.push(recognition);
    recognition.start();

On Chrome it works just fine, but on Safari the recognition results are super bad. It can understand me sometimes, but often it misinterprets my words, giving me wrong results. For example, if I say: "Hello assistant, change contrast", the result might be something like: "Hello assist charge contract hello assist charge charge" or something.

One peculiarity of this problem is that the events fired by the speech recognition interface on safari are just the start and audiostart.

Is anyone facing a similar issue or found a solution to this problem? I am also accepting alternatives for implementing speech recognition on my application.

Thanks in advance!


EDIT

On my end, you can see this problem by visiting any website that relies on the Web Speech API. Some examples that you can check:

https://www.google.com/chrome/demos/speech.html

https://www.audero.it/demo/web-speech-api-demo.html

  • In Chrome it's using a totally different technology. Basically this question is Hey Google vs Hey Siri. Question, do you have the same issues with "regular" Siri? – James Mar 11 '22 at 19:50
  • Using the Siri software on my Mac it seems to understand me perfectly. The problem seems to be with the Web Speech API on the Safari browser. I made an edit with some links so that people can reproduce this bug. – Álvaro José Baranoski Mar 14 '22 at 12:54
  • Hello, I am facing the same issue, no fixes yet :( ? – Syed M. Sannan Jul 04 '22 at 19:27
  • 1
    Hi @Stranger, I have just posted what I did to go around this issue. Shout out if you think it was useful! =D – Álvaro José Baranoski Jul 05 '22 at 20:54
  • 1
    I have the same problem, enabling speech-to-text on other browsers than Chrome. I'm using [react-speech-recognition npm](https://github.com/JamesBrill/react-speech-recognition#readme) as an interface to the [WebSpeech API](https://wicg.github.io/speech-api/#speechreco-section). react-speech-recognition proposed a "[polyfill](https://webspeechrecognition.com/polyfills)" capability to fall back on another speech-to-text service than the browser support of WebSpeech API. I plan to interface [Vosk](https://alphacephei.com/vosk/) (mentioned in Alvaro's answer) offline as a polyfill. It will require – John Rizzo Feb 08 '23 at 10:48

1 Answers1

4

So, if anyone else stumbles at this problem, I have filled an issue at the chromium forum. You can consult the issue here.

Basically, the Chrome team is having some problems integrating this functionality in their browser on iOS devices.

In my case, what I did was use Hark.js to get events based on when the user starts and stops speaking paired with Vosk on my backend to do the offline Speech-to-Text translation.

IMO the browser speech recognition API is fine if you want your app to run on a specific browser. However, if you wish to target all browsers accross different operational systems, I would advise looking for a different solution.