Extract (for OCR) hard-coded video subtitles

Question

Many Korean videos on YouTube have hard-coded subtitles (e.g. https://youtu.be/Zyd6hAvxTnc).

The desired end result would be the OCR'd subtitles in text format.

I have a semi-manual process of downloading the video with yt-dlp, using ffmpeg to create images (e.g. every 1s), bulk crop by fixed dimensions using ImageMagick (hoping the subtitles don't go multi-line...), OCRing using Tesseract (with mixed results - PowerToy's Text Extractor seems much better, but it's very manual), then removing duplicates.

It's not a great solution.

I've tried using OpenCV but without success.

Does anyone know of either:

a) a tool that does this automatically

b) a better way to automate this process (ideally into a single Python script, ideally with automatic detection of the subtitles rather than fixed cropping).

Thanks!

Looking online, I found this [Master's Thesis](https://liu.diva-portal.org/smash/get/diva2:1331490/FULLTEXT01.pdf) by Jonathan Sjölund, which describes a related problem, detecting frozen captions. As part of detecting frozen captions, the author needed to detect captions, and he describes ten techniques for detecting captions in section 2.2.2. — Nick ODell, Aug 01 '23 at 15:47
@NickODell So this is Master's-level material? Wow. Having a read, I think it's a bit above my head haha. Thanks though! — jamesdeluk, Aug 08 '23 at 15:21

Extract (for OCR) hard-coded video subtitles

0 Answers0