I'm working with a TFRecord dataset consisting of multiple grayscale images of cross-sections of a 3D object, ending up with the shape [32, 256, 256]. The dimension of 32 represents the number of cross-sections, and it is significantly less than the other dimensions.
Because of this, I'm wondering if I could treat the data as 2D data with 32 channels instead of treating the data as 3D data with one channel, helping especially with regards to computational resources needed. I'm using TensorFlow right now with TPUs in Google Colab, and using tf.layers.conv2d
instead of tf.layers.conv3d
would save a lot of memory from less padding.
Is there any significant difference between the two methods, or is there any convention I should probably follow? Would using conv2d
harm my accuracy in any way?