I'm working on next frame prediction problem for 3D spatial data. Something simular, but for 2D data was discussed here https://keras.io/examples/vision/conv_lstm/. Basically, I have a set of 3D images which are changing over time, and I'm trying to build a model to predict these changes.
For my problem I use several ConvLSTM3D layers. Input data is a 6D tensor: [batch, timeframes, x_dimension, y_dimension, z_dimension, channels]. If the option return_sequences is set to True, the output's shape of ConvLSTM3D is 6D. Which makes it impossible to use MaxPooling3D layer afterwards as it takes 5D shape as an input.
model.add(layers.ConvLSTM3D(filters=64,
kernel_size=(3, 3, 3),
padding='same',
return_sequences=True,
activation="tanh",
))
model.add(layers.MaxPooling3D(pool_size=(2,2,2)))
Any ideas on how to handle that? Is it only possible to use a pooling layer when return_sequences=False?
Thanks