What is a good way to find discriminative keyframes in a video?

Question

I need to extract a single "keyframe" from a video of a particular human action(the actions could be generic) such that it is discriminative as opposed to descriptive (Finding an interesting frame in a video).

In short, I need to find that one frame in a basketball video that discriminates it from say, a coffee-drinking video.

Most of the papers I've seen have been some kind of video summarization technique, but the frames thus extracted need not be the best to separate action categories. This is my stumbling block - during test time, I only have the test video to extract a keyframe, yet I need some model which will allow me to extract the frame most different from other action category videos.

score 1 · Accepted Answer · answered Oct 22 '11 at 13:18

Although this is an interesting problem, it sounds ill-defined to me. You want a frame (there's a good chance there'll be more than one, so it's probably incorrect to talk about "the one frame") that distinguishes your test video from other videos, but you don't know what the other videos are. For example, what if your whole set consists of basketball videos? Without knowing (or at least having some reasonable expectation of) what the other videos are, this task is impossible even for a human.

One way I could think of involves a probabilistic model that helps you determine how likely a frame is to be unique or not. You could train this model using some existing video test set: compare all the frames to each other using some similarity measure, and focus on the ones that occur the least frequently. Then apply the model to a different (but similar) test set. YMMV.

Lastly, you mentioned that your interested in action categories, but you're focusing on frames, i.e. still images only. It may be useful to first segment the videos into shots (have a look at the link you posted) and then look for the unique shots. You could then pick your unique frame candidate from the unique shots.

Good luck!

Thanks for the advice! I did think of finding unique frames - in fact, I thought of applying the tf-idf model to score frames. But this does not tie the high-scoring frames to the class-label. And the reason I mention 1 frame, is because the problem is constrained as such. The videos are going to be atomic actions (eg. sipping coffee, walking etc). — Sau, Oct 26 '11 at 16:12

score 1 · Answer 2 · answered Oct 29 '11 at 21:28

1

Are the videos fixed background ? (still images, no camera motion)

If so you could use the following naive algorithm:

For each video, compute the mean image by averaging each pixel over time. (= synthetic representative image).
For each video: A. For each frame, compute a distance score between it and the representative frame of other videos. B. Keep the frame that has the highest distance overall. (the frame that is the most different from the representatives of other videos)

answered Oct 29 '11 at 21:28

Joan Charmant

2,012
1
18
23

That sounds goods! I'll have to try it out. One concern with averaging though is that possibly long unimportant actions (say sitting for a long time before sipping coffee) could overwhelm the representative image. – Sau Oct 31 '11 at 06:33

What is a good way to find discriminative keyframes in a video?

2 Answers2