I have a dataset in which each datapoint consists of 5 videos in two dimensions, as a numpy array with shape (48,128,42,5). (height, width, frames, video index). The multiple videos basically serve as "slices" to get some information about depth, although imperfect.
I want to create a CNN using Keras/Tensorflow for regression, but Keras only has built-in Convolutional layers for up to 3 dimensions. Is there a good way to perform convolution and max-pooling on 4 dimensional data? Or will I need to create my own layer using Tensorflow?