0

I would like to extract subtitles from video files eventually.

Current video files are located on physical disk, so they will be considered as train/test data. But imagine, that I have running web-app where I upload the fresh video and my web-app should during on-load time extract subtitles etc. I want to make it as much accurate as one of this decoder can :) Please advise.

Novitoll
  • 820
  • 1
  • 9
  • 22

1 Answers1

3

You need to use Kaldi

With implementation of modern algorithms for speech recognition (deep neural networks and WFST search) Kaldi is much more accurate (> 50%) and much faster. Neither of those implemented in sphinx4 or pocketsphinx.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Wow, I haven't got familiar with Sphinx architecture well, but I got that acoustic models are based on Hidden Makarov models though. Thanks, I will take a look at Kaldi, but then my obvious question is - what is CMU Sphinx's competition? But I guess, I should ask another question on "Sphinx vs Kaldi". Thanks again – Novitoll Oct 18 '16 at 11:47
  • You can ask such question, but not on stackoverflow. Questions to recommend or find a tool are not welcome here. – Nikolay Shmyrev Oct 18 '16 at 12:02