2

everyone.

Now I have an object classification task, and I have a dataset containing a large number of videos. In every video, some frames(not every frame, about 160 thousand frames) have its labels, since a frame may have multiple objects.

I have some confusion about creating the dataset. My idea is to convert videos to frames firstly, then every frame only with labels will be made as tfrecord or hdf5 format. Finally, I would write every frame's path into csv files (training and validation) using for my task.

My question is : 1. Is there efficient enough(tfrecord or hdf5)? Should I preprocess every frame such as compression for save the storage space before creating tfrecord or hdf5 files? 2. Is there a way to handle the video dataset directly in tensorflow or pytorch?

I want to find an efficient and conventional way to handle video datasets. Really looking forward to every answer.

Jiangang yang
  • 89
  • 1
  • 7

2 Answers2

3

I am no TensorFlow guy, so my answer won't cover that, sorry.

Video formats generally gain compression at the cost of longer random-access times thanks to exploiting temporal correlations in the data. It makes sense because one usually accesses video frames sequentially, but if your access is entirely random I suggest you convert to hdf5. Otherwise, if you access sub-sequences of video, it may make sense to stay with video formats.

PyTorch does not have any "blessed" approaches to video AFAIK, but I use imageio to read videos and seek particular frames. A short wrapper makes it follow the PyTorch Dataset API. The code is rather simple but has a caveat, which is necessary to allow using it with multiprocessing DataLoader.

import imageio, torch

class VideoDataset:
    def __init__(self, path):
        self.path = path

        # explained in __getitem__
        self._reader = None

        reader = imageio.get_reader(self.path, 'ffmpeg')
        self._length = reader.get_length()

    def __getitem__(self, ix):
        # Below is a workaround to allow using `VideoDataset` with
        # `torch.utils.data.DataLoader` in multiprocessing mode.
        # `DataLoader` sends copies of the `VideoDataset` object across
        # processes, which sometimes leads to bugs, as `imageio.Reader`
        # does not support being serialized. Since our `__init__` set
        # `self._reader` to None, it is safe to serialize a
        # freshly-initialized `VideoDataset` and then, thanks to the if
        # below, `self._reader` gets initialized independently in each
        # worker thread.

        if self._reader is None:
            self._reader = imageio.get_reader(self.path, 'ffmpeg')

        # this is a numpy ndarray in [h, w, channel] format
        frame = self._reader.get_data(ix)

        # PyTorch standard layout [channel, h, w]
        return torch.from_numpy(frame.transpose(2, 0, 1))

     def __len__(self):
        return self.length

This code can be adapted to support multiple video files as well as to output the labels as you would like to have them.

Jatentaki
  • 11,804
  • 4
  • 41
  • 37
0

I have been building a simple API called Sieve for exactly this. There seems to be no good way of working with raw video data and building a dataset, with just the "interesting" samples. Having to process hours or sometimes days of video footage is expensive and takes a long time as well.

Sieve basically takes care of all of this for you. Upload videos to Sieve from a cloud bucket or from local storage, and use our web app to export and download the exact frames you want - depending on things like motion, objects, and more.

To see how you might upload local videos to Sieve, check out this repo: https://github.com/Sieve-Data/automatic-video-processing