1

For my project I want to extract frames from a video to make thumbnails. However every method i find is super slow going through the entere video. Here is what I have tried:

with iio.imopen(v.file.path, "r") as vObj:
    metadata = iio.immeta(v.file.path, exclude_applied=False)
    frame_num = int(metadata['fps']*metadata['duration']-metadata['fps'])
    for i in range(10):
       x = int((frame_num/100)*(i*10))
       frame = vObj.read(index=x)
       path = v.get_thumbnail_path(index=i)
       os.makedirs(os.path.dirname(path), exist_ok=True)
       iio.imwrite(path, frame)
       logger.info('Written video thumbnail: {}'.format(path))

For a long video that takes extremely long. I know videos are compressed over multiple frames, however if I just manually open a video and jump to a point it also does not require to go through the video from first to last frame.

I don't care about specific frames, just roughly 10%, so sticking to keyframes is fine, if it makes it faster.

How to grab a frame every 10% of the video quickly?

Thank you.

JasonTS
  • 2,479
  • 4
  • 32
  • 48
  • 1
    you need to **seek** to the nearest keyframe (or whatever these things are called these days), instead of to some specific point exactly. seeking to keyframes is quick. consult iio docs. if iio can't do it, you can use PyAV. PyAV is a backend to iio. – Christoph Rackwitz Feb 01 '23 at 19:38

1 Answers1

1

The way you are approaching it is correct, and a current PR of mine (#939) will make the performance of calling read multiple times competitive.

Small benchmark:

import imageio.v3 as iio
import numpy as np
from timeit import Timer

def bench():
    with iio.imopen("cam1.mp4", "r", plugin="pyav") as file:
        n_frames = file.properties().shape[0]
        read_indices = np.linspace(0, n_frames-1, 10, dtype=int)
        for count, idx in enumerate(read_indices):
            frame = file.read(index=idx)
            iio.imwrite(f"thumbs/thumbnail_{count}.jpg", frame)

best = min(Timer("bench()", globals=globals()).repeat(5, number=1))
print(f"Best time: {best:.3f}")
Current (v2.25.0):   Best time: 2.134
Future (after #939): Best time: 0.924

The above benchmark uses a small, publically available video. The real-world gain will depend on the specific video being processed (and how keyframes are laid out within it). For example, in a longer video (several minutes, not publically available) you will notice a real difference:

Current (v2.25.0):   Best time: 42.952
Future (after #939): Best time: 1.687

This could be even faster for shorter GOP sizes, but that requires you to have control over how the video is produced and comes at the expense of increased file-size...

I don't care about specific frames, just roughly 10%, so sticking to keyframes is fine, if it makes it faster.

Reading keyframes isn't a reliable approach here unless you can make some assumptions/assertions about the types of videos you are operating on. Many videos aim for a GOP length of 250 frames but will have shorter/dynamic lengths based on how dynamic the content being encoded is. You can't generally know where keyframes are in advance, so you may end up with arbitrarily skewed results.

For example, a relatively slow-changing video recorded for 10 seconds at 25FPS (with a realized GOP of 250 frames) will have exactly 1 keyframe (the first one), and seeking to the closest keyframe anywhere in the video will always yield the first frame. Likely this isn't what you have in mind.

FirefoxMetzger
  • 2,880
  • 1
  • 18
  • 32
  • Wow, that was a lot of insight, thanks! I did have the case with mostly cellphone videos where all screenshots turned out the same using ffmpegs seek. Is it possible to obtain the keyframe density of a file to determine if accurate frame searching is neccessary? otherwise i'd assume that keyframes do have a higher quality and should be prioritised. – JasonTS Feb 27 '23 at 20:32
  • @JasonTS Not sure I understand the question. If you are thinking of static analysis, you can use `immeta` (or `metadata` inside a `imopen` context) to inspect if a frame is a keyframe as well as lots of other useful info about a frame. How much info you can get without decoding the video depends _a lot_ on the container and codec used and what metadata the creator attached to the container. I am not aware of a generic way to find keyframes without decoding, unfortunately. If I did I'd probably add it to ImageIO. – FirefoxMetzger Feb 28 '23 at 19:01