Imagine a short video clip like this: black background, a line of white text in the center that gets gradually filled with red color, not only letter by letter, but each individual letter is filled gradually. Here is a simplified image that illustrates this:
(There is a bunch of frames in between, but they are omitted for simplicity.)
Thus, after some time (like 10 seconds) the whole string will be red.
Now the task I have to solve:
- I have to recognize the initial string, thus I should get "HELLO WORLD" as the result.
- Not only that. For every letter I have to find out at which point it stars getting filled, and at which point it is completely filled.
The output might be like this:
H,0ms,1000ms
E,1000ms, 1500ms
L,1500,2500ms
L,2500ms,3500ms
O,3500ms,4000ms
... and so on.
The speed may vary for different letters. The typeface and font size is always the same. The character set includes lower- and uppercase letters.
I considered two approaches: OCR recognition or neural network. I have little experience with either.
I assume that the OCR approach will let me easily recognize the text. But how do I recognize not filled vs. filled letters?
The neural network approach will probably let me recognize both unfilled/filled letters, but for this I have to split the image into separate letters, which might be a complex task in itself.
Are there any other options available? Or given the two options above, which one would you recommend and how would you work around the issues outlined for the two approaches?