pyttsx3 prints the current word being uttered

Question

i basically want the tts to talk while printing out what it is saying. i'v pretty much copied and pasted the pyttsx3 documentation to do this but it just would not work.

import pyttsx3
def onStart(name):
   print ('starting', name)
def onWord(name, location, length):
   print ('word', name, location, length)
def onEnd(name, completed):
   print ('finishing', name, completed)
engine = pyttsx3.init()
engine.connect('started-utterance', onStart)
engine.connect('started-word', onWord)
engine.connect('finished-utterance', onEnd)
engine.say('The quick brown fox jumped over the lazy dog.')
engine.runAndWait()

and the result is this. the word event only fires after the speaking was complete and none of the words are actually printed.

starting None
word None 1 0
finishing None True

iv been working on this for days, iv tried other libraries like win32com.client.Dispatch('SAPI.Spvoice') and gtts, but none seems to be able to do what I want. Sapi.spvoice seems to have an event which would do what I want it, but I cant seem to get that to work either. though I'm not sure I'm doing it correctly either. https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms723593(v=vs.85)

from win32com.client import Dispatch
import win32com.client

class ContextEvents():
    def onWord():
        print("the word event occured")
        
        # Work with Result
        
s = Dispatch('SAPI.Spvoice')
e = win32com.client.WithEvents(s, ContextEvents)
s.Speak('The quick brown fox jumped over the lazy dog.')

from what I understood, there needs to be a class for the events and the event must in the form of On(event) in that class. or something. i tried installing espeak but that did not work out either. keep in mined I'm kinda of a newb in python so if anyone would be willing to give a thorough explination that would be really great.

score 0 · Answer 1 · answered Apr 23 '21 at 22:42

So I'm not familiar with that library, but most likely what's happening is the stream is getting generated and played before the events are able to be passed off to the wrapper library. I can say that AWS's Polly will output word-level timing information if you want to use that - you'd need two calls - one to get the audio stream and the other to get the ssml metadata.

The Windows .net System.Speech.Synthesis library does have progress events that you could listen for, but I don't know if there's a python library to wrap that.

However, if you're willing to run a powershell command from python then you can try using this gist I wrote, which wraps the Windows synthesis functionality and outputs the word timings. Here's an example that should get you what you want:

$text = "hello world! this is a long sentence with many words";
$sampleRate = 24000;

# generate tts and save bytes to memory (powershell variable)
# events holds event timings
# NOTE: assumes out-ssml-winrt.ps1 is in current directory, change as needed...
$events = .\out-ssml-winrt.ps1 $text -Variable 'soundstream' -SampleRate $sampleRate -Channels 1 -SpeechMarkTypes 'words';

# estimate duration based on samplerate (rough)
$estimatedDurationMilliseconds = $global:soundstream.Length / $sampleRate * 1000;

$global:e = $events;

# add a final event at the end of the loop to wait for audio to complete
$events += @([pscustomobject]@{ type = 'end'; time = $estimatedDurationMilliseconds; value = '' });
# create background player
$memstream = [System.IO.MemoryStream]::new($global:soundstream);
$player = [System.Media.SoundPlayer]::new($memstream)
$player.Play();

# loop through word events
$now = 0;
$events | % {
    $word = $_;
    # milliseconds into wav file event happens
    $when = $word.time;
    # distance from last timestamp to this event
    $delta = $when - $now;
    # wait until right time to display
    if ($delta -gt 0) {
        Start-sleep -Milliseconds $delta;
    }
    $now = $when;
    # output word
    Write-Output $word.value;
}
# just to let you know - audio should be finished
Write-Output "Playback Complete";
$player.Stop(); $player.Dispose(); $memstream.Dispose();

pyttsx3 prints the current word being uttered

1 Answers1