I am new to both python and neural networks.I am trying to build a model with CNN+RNN for gesture classification in the video.Each video consist of 30 Frames (batch of 30 images). For CNN layer,I am using Conv2D layer in CNN layer and GRU in RNN layer. The images are of size 84*84 and are RGB images(Channel = 3). I am getting the "ValueError: Input 0 is incompatible with layer gru1: expected ndim=3, found ndim=4" error when I am trying to add the GRU layer. Below is my code:
model1 = Sequential()
model1.add(Conv2D(64, (3,3), strides=(1,1), padding='same', input_shape=(84,84,3),name='c2D1'))
model1.add(BatchNormalization())
model1.add(Activation('elu'))
model1.add(MaxPooling2D(pool_size=(2,1), strides=(2,1)))
model1.add(Conv2D(128, (3,3), strides=(1,1), padding='same',name='c2D2'))
model1.add(BatchNormalization())
model1.add(Activation('elu'))
model1.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model1.add(Conv2D(256, (3,3), strides=(1,1), padding='same',name='c2D3'))
model1.add(BatchNormalization())
model1.add(Activation('elu'))
model1.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model1.add(Conv2D(256, (3,3), strides=(1,1), padding='same',name='c2D4'))
model1.add(BatchNormalization())
model1.add(Activation('elu'))
model1.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model1.add(GRU(units=50,input_shape=(256,84,84),return_sequences=True,name='gru1'))
model1.add(layers.Dense(nb_labels))
model1.add(Flatten())
model1.add(Dropout(0.5))
model1.add(Dense(512, activation='elu'))
model1.add(Dropout(0.5))
model1.add(Dense(5, activation='softmax'))
Kindly let me know what should be the correct value for the input_shape for the GRU layer.