0

I am new to both python and neural networks.I am trying to build a model with CNN+RNN for gesture classification in the video.Each video consist of 30 Frames (batch of 30 images). For CNN layer,I am using Conv2D layer in CNN layer and GRU in RNN layer. The images are of size 84*84 and are RGB images(Channel = 3). I am getting the "ValueError: Input 0 is incompatible with layer gru1: expected ndim=3, found ndim=4" error when I am trying to add the GRU layer. Below is my code:

    model1 = Sequential()
    model1.add(Conv2D(64, (3,3), strides=(1,1), padding='same', input_shape=(84,84,3),name='c2D1'))
    model1.add(BatchNormalization())
    model1.add(Activation('elu'))
    model1.add(MaxPooling2D(pool_size=(2,1), strides=(2,1)))

    model1.add(Conv2D(128, (3,3), strides=(1,1), padding='same',name='c2D2'))
    model1.add(BatchNormalization())
    model1.add(Activation('elu'))
    model1.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))



    model1.add(Conv2D(256, (3,3), strides=(1,1), padding='same',name='c2D3'))
    model1.add(BatchNormalization())
    model1.add(Activation('elu'))
    model1.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))


    model1.add(Conv2D(256, (3,3), strides=(1,1), padding='same',name='c2D4'))
    model1.add(BatchNormalization())
    model1.add(Activation('elu'))
    model1.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))


    model1.add(GRU(units=50,input_shape=(256,84,84),return_sequences=True,name='gru1'))
    model1.add(layers.Dense(nb_labels))


    model1.add(Flatten())
    model1.add(Dropout(0.5))
    model1.add(Dense(512, activation='elu'))
    model1.add(Dropout(0.5))
    model1.add(Dense(5, activation='softmax'))

Kindly let me know what should be the correct value for the input_shape for the GRU layer.

ashutosh
  • 31
  • 5
  • Does this answer your question? [Keras ValueError: Input 0 is incompatible with layer conv2d\_1: expected ndim=4, found ndim=5](https://stackoverflow.com/questions/47665391/keras-valueerror-input-0-is-incompatible-with-layer-conv2d-1-expected-ndim-4) – Thomas Weller Mar 25 '20 at 21:44
  • Does this answer your question? https://stackoverflow.com/questions/44583254/valueerror-input-0-is-incompatible-with-layer-lstm-13-expected-ndim-3-found-n?rq=1 – Thomas Weller Mar 25 '20 at 21:44
  • Does this ansewr your question? https://stackoverflow.com/questions/54877516/valueerror-input-0-is-incompatible-with-layer-conv2d-5-expected-ndim-4-found?rq=1 – Thomas Weller Mar 25 '20 at 21:44
  • Does this answer your question? https://stackoverflow.com/questions/54118069/valueerror-input-0-is-incompatible-with-layer-conv1d-1-expected-ndim-3-found?rq=1 – Thomas Weller Mar 25 '20 at 21:45
  • Does this answer your question? https://stackoverflow.com/questions/56859738/how-to-fix-valueerror-input-0-is-incompatible-with-layer-flatten-expected-mi?rq=1 – Thomas Weller Mar 25 '20 at 21:45
  • Does this answer your question? https://stackoverflow.com/questions/58165813/how-to-fix-valueerror-input-0-is-incompatible-with-layer-lstm-2-expected-ndim?rq=1 – Thomas Weller Mar 25 '20 at 21:46

2 Answers2

0

Conv2D takes a 3 dims image in input (width x height x channels). If you want to trick keras into processing a color video you should use Conv3D (at least in the first/first two layers). Conv3D accepts takes as input width x height x depth x channels. You could keep your color channels as "channels" and use the depth dimension as the time one.

user26067
  • 383
  • 1
  • 2
  • 16
0

Your code in no way does what you say you want it to do. Those convolutional layers expect a batch of images but what you describe is a batch of sequences of images. That GRU layer should (for it to work) be given a tensor of shape (batch_size, sequence_length, features) but instead it's getting a (batch_size, 5, 10, 256) with its input_shape parameter pointlessly set to (256,84,84).

To get what you want, make the convolutional part time-distributed (https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed), then flatten everything after the second dimension (using https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape), and only then apply the GRU. You don't have to tell the GRU layer the correct input shape since it will infer that automatically.

simon
  • 796
  • 1
  • 6
  • 11