I am no TensorFlow guy, so my answer won't cover that, sorry.
Video formats generally gain compression at the cost of longer random-access times thanks to exploiting temporal correlations in the data. It makes sense because one usually accesses video frames sequentially, but if your access is entirely random I suggest you convert to hdf5. Otherwise, if you access sub-sequences of video, it may make sense to stay with video formats.
PyTorch does not have any "blessed" approaches to video AFAIK, but I use imageio
to read videos and seek particular frames. A short wrapper makes it follow the PyTorch Dataset
API. The code is rather simple but has a caveat, which is necessary to allow using it with multiprocessing DataLoader
.
import imageio, torch
class VideoDataset:
def __init__(self, path):
self.path = path
# explained in __getitem__
self._reader = None
reader = imageio.get_reader(self.path, 'ffmpeg')
self._length = reader.get_length()
def __getitem__(self, ix):
# Below is a workaround to allow using `VideoDataset` with
# `torch.utils.data.DataLoader` in multiprocessing mode.
# `DataLoader` sends copies of the `VideoDataset` object across
# processes, which sometimes leads to bugs, as `imageio.Reader`
# does not support being serialized. Since our `__init__` set
# `self._reader` to None, it is safe to serialize a
# freshly-initialized `VideoDataset` and then, thanks to the if
# below, `self._reader` gets initialized independently in each
# worker thread.
if self._reader is None:
self._reader = imageio.get_reader(self.path, 'ffmpeg')
# this is a numpy ndarray in [h, w, channel] format
frame = self._reader.get_data(ix)
# PyTorch standard layout [channel, h, w]
return torch.from_numpy(frame.transpose(2, 0, 1))
def __len__(self):
return self.length
This code can be adapted to support multiple video files as well as to output the labels as you would like to have them.