I need to analyse a sentence/phrase and the output time it takes to utter each word. for example, In the sentence
How can mirrors be real if our eyes aren't?
I need this
Word Time
--------- -------
How 101ms
can 95ms
mirrors 180ms
be 70ms
real 120ms
if 80ms
our 99ms
eyes 101ms
aren't? 180ms
(I made this one up. these are not the actual utterance times)
One method of doing this is by assuming that word length is proportional to utterance time, but this isn't always true ('Queue' and 'Q' have the same utterance time although they differ in word length)
Also presence of punctuation marks have to be factored in.
Bonus: Recognizing Emotions :)
Can anyone point me to algorithms/papers which does this? Is there any way to hack this up from existing Text-to-speech code? Java code suggestions are appreciated!