1

I have a text script that is used to create podcasts. So the words in podcast audio are exactly the same as in my text. Now what I want to have is the following:

Word in text | Pronounciation started at
Hello          0:0:0.000
my             0:0:1.125
friends        0:0:2.750

Is that possible to do at all? Thanks in advance!

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488

1 Answers1

1

One of the key words you could start with to approach the complexity of the problem is "forced alignment". This site also covers questions regarding this topic e.g. here which leads you to questions and answers concerning HTK (the Hidden Markov Model Toolkit) via the releated threads.

You can find a more hands-on style description of how to use forced alignment in automated audio segmentation here.

So the answer is: yes, it is possible, but it is algorithmically very complex and even in its best implementations it is not error-free.

PS.: I found you a really simple tool

Community
  • 1
  • 1
Hartmut Pfitzinger
  • 2,304
  • 3
  • 28
  • 48
  • Thanks a lot for you answer! I'll take a look at the links you provided. I don't have any experience with audio processing though. Do you know if there are any tools I can use to do that withoug diving in the specifics of audio processing? Or maybe you know someone who might be interested in working on this? Maybe website where I can contract someone for the job. Thanks in advance! – Max Koretskyi Jun 28 '14 at 16:55
  • 1
    Yes, there are tools, but as far as I know they are really hard to operate if you're not an expert (as with all tools, I would say). And the people I know earn money with this kind of jobs because they know how to use and tweak these tools. But let me think, maybe I remember a freeware tool with an acceptable result for English. – Hartmut Pfitzinger Jun 28 '14 at 17:16
  • Thanks, it would be great. And those people who you know and who `earn money with this kind of jobs because they know how to use and tweak these tools` - what do you think they would charge per an hour of consulting? I basically need a simple system - I put in text and get the alignment back. We are going to create records from the text ourselves so we can use some recommended techniques that can make alignment easier. – Max Koretskyi Jun 28 '14 at 17:21
  • See my PS in the answer. I suppose you won't get an hour of consulting, it's rather half a day and a real bunch of money. The Aligner should really do what you want and there are tons of tutorials – Hartmut Pfitzinger Jun 28 '14 at 18:18
  • Yes, just checked out the video on how to use the tool you found. It seems to be easy enough. Thanks a lot for you help! I hope this is going to help me solve my task. But I'm still wondering what would it cost me to get a half day of consulting? Roughly just to give me the idea of possible costs? – Max Koretskyi Jun 28 '14 at 18:36
  • I didn't find a PM possibility in stackoverflow but you will find my email in the impressum of my website. Good luck! – Hartmut Pfitzinger Jun 28 '14 at 18:41
  • Thanks! I've sent you an email. Please check. – Max Koretskyi Jun 28 '14 at 19:13
  • For the record, I maintain the AGPL-licensed, stand-alone forced aligner aeneas: https://github.com/readbeyond/aeneas/ – Alberto Pettarin Jul 30 '16 at 09:26