1

Hi I made a video frames loader Dataset to be fed into a pytorch model. I want to sample frames from a video, but the frames should be uniformly sampled from each video. This is the class I came up with. I was wondering if there was any better method to speed up the sampling process.
Do you have any suggestion especially in the read_video method part??
Thanks

import torch
import torchvision as tv
import cv2
import numpy as np

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
from pathlib import Path

class VideoLoader(torch.utils.data.Dataset):

  def __init__(self, data_path, classes, transforms=None, max_frames=None, frames_ratio=None):
    super(VideoLoader, self).__init__()

    self.data_path = data_path
    self.classes = classes
    self.frames_ratio = frames_ratio

    self.transforms = transforms
    self.max_frames = max_frames
  
  def read_video(self, path):
    frames = []

    vc = cv2.VideoCapture(path)

    total_frames = int(vc.get(cv2.CAP_PROP_FRAME_COUNT))
    
    if self.frames_ratio:
      if type(self.frames_ratio) is float:
        frames_to_pick = int(total_frames * self.frames_ratio)
      else:
        frames_to_pick = self.frames_ratio
    else:
        frames_to_pick = total_frames
    
    idxs = np.linspace(0, total_frames, frames_to_pick, endpoint=False)

    for i in idxs:
      ok, f = vc.read()
      if ok:

        f = tv.transforms.ToTensor()(f)
        f = self.transforms(f) if self.transforms else f
        frames.append(f)
        
        vc.set(cv2.CAP_PROP_POS_FRAMES, i)
        if self.max_frames and len(frames) == self.max_frames: break
      else: break
    vc.release()
    return torch.stack(frames)

  def __getitem__(self, index):
    v_path, label = self.data_path[index]
    return self.read_video(v_path), self.classes[label]

  def __len__(self): return len(self.data_path)

3nomis
  • 1,175
  • 1
  • 9
  • 30

2 Answers2

0

Because you can't really seek through a video in parallel, there's not really any faster sampling process you can run locally. I personally had trouble with this problem which is why I started building a simple API for this called Sieve. You can literally upload data directly to Sieve (either from a cloud bucket or from local storage) and it'll quickly cut up all the frames for you and even mark them with things like motion, people, objects, and more. It parallelizes using serverless functions in the cloud which makes it really fast, even for hours or days of footage.

You can then quickly export from Sieve using the dashboard which gives you a quick curl command you can run to download the exact samples you want.

Here's a helpful repo: https://github.com/Sieve-Data/automatic-video-processing

0

If you are happy with extracting the frames of each video to disk beforehand, this library is exactly what you're looking for: Video-Dataset-Loading-PyTorch on Github https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/31457311) – tomerpacific Apr 07 '22 at 06:42