Many Korean videos on YouTube have hard-coded subtitles (e.g. https://youtu.be/Zyd6hAvxTnc).
The desired end result would be the OCR'd subtitles in text format.
I have a semi-manual process of downloading the video with yt-dlp, using ffmpeg to create images (e.g. every 1s), bulk crop by fixed dimensions using ImageMagick (hoping the subtitles don't go multi-line...), OCRing using Tesseract (with mixed results - PowerToy's Text Extractor seems much better, but it's very manual), then removing duplicates.
It's not a great solution.
I've tried using OpenCV but without success.
Does anyone know of either:
a) a tool that does this automatically
b) a better way to automate this process (ideally into a single Python script, ideally with automatic detection of the subtitles rather than fixed cropping).
Thanks!