We have some proofread .srt files and we want to generate audio from them by AWS Polly. According to references on AWS Polly, the input type for Polly is either plain text or SSML enhanced-text. Is there a way to convert .srt file to SSML enhanced-text?
We want to use .srt files because they are proofread and they record "audio pausing" information in the file. For example:
1
00:00:04,960 --> 00:00:06,880
- [Instructor] Bacteria
are able to inhabit
2
00:00:06,880 --> 00:00:09,220
almost every environment on Earth,
3
00:00:09,500 --> 00:00:12,740
from desert tundra to
tropical rainforests.
There's a gap between 00:00:09220 to 00:00:09,500, this is the "audio pausing" information we have.
AWS Polly references: https://docs.aws.amazon.com/polly/latest/dg/ssml-to-speech-console.html
If there's no way to convert .srt to SSML enhanced-text, should I parse the .srt file to generate SSML enhanced-text that Polly can understand?