0

I'm working on next frame prediction problem for 3D spatial data. Something simular, but for 2D data was discussed here https://keras.io/examples/vision/conv_lstm/. Basically, I have a set of 3D images which are changing over time, and I'm trying to build a model to predict these changes.

For my problem I use several ConvLSTM3D layers. Input data is a 6D tensor: [batch, timeframes, x_dimension, y_dimension, z_dimension, channels]. If the option return_sequences is set to True, the output's shape of ConvLSTM3D is 6D. Which makes it impossible to use MaxPooling3D layer afterwards as it takes 5D shape as an input.

model.add(layers.ConvLSTM3D(filters=64, 
                            kernel_size=(3, 3, 3),
                            padding='same', 
                            return_sequences=True,
                            activation="tanh",
                            ))
model.add(layers.MaxPooling3D(pool_size=(2,2,2)))

Any ideas on how to handle that? Is it only possible to use a pooling layer when return_sequences=False?

Thanks

1 Answers1

0

Sorry for the late answer. I think this might solve your problem:

model.add(layers.TimeDistributed(layers.MaxPooling3D(pool_size=(2,2,2))))

TimeDistributed performs the desired action in parallel across all time steps. See here for more information: https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed

Elena Doe
  • 1
  • 1