How to implement word-by-word display with Microsoft Text to Speech?

Asked Aug 02 '23 at 07:48

Active Aug 02 '23 at 07:48

Viewed 11 times

I am building an app with React that uses Microsoft Speech to handle Text to Speech (TTS) tasks.

In the app there is a process that fetches the response from ChatGPT as a stream then feed each complete sentence into the TTS queue. There is a text box that will display the all the current tokens. The tokens need to form a sentence, plus a delay to convert that text into speech, therefore, there is a significant delay between the text displayed and the speech.

I want to display the text word-by-word in sync with Microsoft Speech. I would like to know if Microsoft TTS provides the timestamps where the words are spoken. For example, something similar like this: input - "How are you?", output - [{word: "How", timestamp: 0}, {word: "are", timestamp: 0.5}, {word: "you?", timestamp: 0.9}]. Or if there is any event that notifies when a word is spoken.

asked Aug 02 '23 at 07:48

Tsuu

How to implement word-by-word display with Microsoft Text to Speech?

0 Answers0