I need to calculate the speech rate of each line of subtitle. The content of the srt (subtitles) file looks like this:
1
00:00:19,000 --> 00:00:21,989
I'm Annita McVeigh and welcome to Election Today where we'll bring you
2
00:00:22,000 --> 00:00:23,989
the latest from the campaign trail, plus debate and analysis.
3
00:00:24,000 --> 00:00:28,989
The Liberal Democrats promise to protect the pay of millions
For example, it takes 4 seconds 989 milliseconds to say the 10 words "The Liberal Democrats promise to protect the pay of millions". The average speech rate of these 10 words is 498.9 milliseconds per word.
How do I read the srt file so that I can have a dataframe with startTime, endTime, textString and wordCount as columns and lines of subtitle as rows like below?
startTime<-c("00:00:19,000", "00:00:22,000", "00:00:24,000")
endTime<-c("00:00:21,989", "00:00:23,989", "00:00:28,989")
textString<-c("I'm Annita McVeigh and welcome to Election Today where we'll bring you", "the latest from the campaign trail, plus debate and analysis.", "The Liberal Democrats promise to protect the pay of millions")
wordCount<-c(12,10,10)
rate.df<-data.frame(startTime, endTime, textString, wordCount)
How do I subtract startTime from endTime in R, when time is presented in the form of hour:minute:second,millisecond?