1

I just did an interesting test of running a speech recogniser service and using NSSpeechSynthesis to echo what I said using NSSpeechSynthesizer.

However, NSSpeechSynthesizer is notorious for being slow and unresponsive, and I wanted to know if anyone has tried optimising this by specifying either a core, a thread or the GPU (using metal) to process both recognition and synthesis.

I've been checking the following article to understand better pipelining values through the metal buffer: http://memkite.com/blog/2014/12/30/example-of-sharing-memory-between-gpu-and-cpu-with-swift-and-metal-for-ios8/

The author has used Metal for off loading the sigmoid function used in ML which makes complete sense as vector maths is what GPUs do best.

However, I would like to know if anyone has explored the possibility of sending other type of data, floats values from a wave form or other (render synthesis through the GPU).

Particularly, has anyone tried this for NSSpeechRecogniser or NSSpeechSynthesizer?

As it goes now, I have a full 3D scene with 3D HRTF sound, and both recognition and synthesis work but sometimes there's a noticeable lag, so maybe dedicating a buffer pipeline through the GPU MTLDevice then back to play the file might work?

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
triple7
  • 542
  • 3
  • 17
  • 1
    I urge you to please change your question so it becomes a clear question about something you tried and did not work, with a code example if possible. In particular, you should avoid the `has anyone tried this` type of question because it provides grounds for your question to be closed by moderators. – gpu3d May 01 '16 at 17:43
  • There are many open source speech engines in ANSI C. You could optimize them with SIMD instructions for the Apple platform: https://github.com/mattt/Surge/blob/master/README.md – Darko May 01 '16 at 19:30
  • Hi, I have edited with code samples and audio demo to make my question non generic, but for some reason it says my text has unproperly indented code. Being on a screen reader, I'm not sure where it goes wrong. Will try again later. Here's the audio demo: https://dl.dropboxusercontent.com/u/3678065/test_recognition.mp3 – triple7 May 02 '16 at 01:23

0 Answers0