0

This is my first time asking a question here (that's mean I'm really need help) and sorry for my bad English. I want to make a cnn-lstm layer for video classification in Keras but I have a problem on making my y_train. I will describe my problem after this. I have videos dataset (1 video has 10 frames) and I converted the videos to images. First I splited the dataset to xtrain, xtest, ytrain, and ytest (20% test, 80% train) and I did it.

X_train, X_test = img_data[:trainco], img_data[trainco:]
y_train, y_test = y[:trainco], y[trainco:]

X_train shape : (2280, 64, 64, 1) -> I have 2280 images, 64x64 height x widht, 1 channel

y_train shape : (2280, 26) -> 26 classes

And then I must reshape them before entering the cnn-lstm process. *note : I do the same thing with x_test and y_test

time_steps = 10 (because I have 10 frames per video)

X_train = X_train.reshape(int(X_train.shape[0] / time_steps), time_steps, X_train.shape[1], X_train.shape[2], X_train.shape[3])
y_train = y_train.reshape(int(y_train.shape[0] / time_steps), time_steps, y_train.shape[1])

X_train shape : (228, 10, 64, 64, 1), y_train shape : (228, 10, 26)

And then this is my model :

model = Sequential()
model.add(TimeDistributed(Conv2D(32, (3, 3), strides=(2, 2), activation='relu', padding='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(256, return_sequences=False, input_shape=(64, 64)))
model.add(Dense(128))
model.add(Dense(64))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
checkpoint = ModelCheckpoint(fname, monitor='acc', verbose=1, save_best_only=True, mode='max', save_weights_only=True)
hist = model.fit(X_train, y_train, batch_size=num_batch, nb_epoch=num_epoch, verbose=1, validation_data=(X_test, y_test), callbacks=[checkpoint])

But I got an error that says

ValueError: Error when checking target: expected dense_3 to have 2 dimensions, but got array with shape (228, 10, 26)

Like it says expected to have 2 dimensions. I changed the code to

y_train = y_train.reshape(int(y_train.shape[0] / time_steps), y_train.shape[1])

And I got an error again that says

ValueError: cannot reshape array of size 59280 into shape (228,26)

And then I change the code again to

y_train = y_train.reshape(y_train.shape[0], y_train.shape[1])

And I still got an error

ValueError: Input arrays should have the same number of samples as target arrays. Found 228 input samples and 2280 target samples.

What should I do? I know the problem but I don't know how to solve it. Please help me.

devyl
  • 121
  • 7

1 Answers1

1

I recreated a slightly simplified version of your situation to reproduce the problem. Basically, it appears that the LSTM layer is only putting out one result for the entire sequence of time steps, thereby reducing the dimension from 3 to 2 in the output. If you run my program below, I've added the model.summary() which provides details of the architecture.

from keras import Sequential
from keras.layers import TimeDistributed, Dense, Conv2D, MaxPooling2D, Flatten, LSTM
import numpy as np

X_train = np.random.random((228, 10, 64, 64, 1))
y_train = np.random.randint(2, size=(228, 10, 26))
num_classes = 26

# Create the model
model = Sequential()
model.add(TimeDistributed(Conv2D(32, (3, 3), strides=(2, 2), activation='relu', padding='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Flatten(),name='Flatten'))
model.add(LSTM(256, return_sequences=False, input_shape=(64, 64)))
model.add(Dense(128))
model.add(Dense(64))
model.add(Dense(num_classes, activation='softmax', name='FinalDense'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])

#
model.summary()
# hist = model.fit(X_train, y_train, epochs=1)

I believe you'll need to decide if you want to reduce the dimension of the y_train (target) data to be consistent with the model, or change the model. I hope this helps.

ad2004
  • 809
  • 6
  • 7
  • Thank you for you answer. Yes, I'm fully aware that I should reduce the dimension from 3 to 2. But like I describe in the question above, I already reduce the dimension but there're another error. I don't know the right way to reduce the dimension like how? – devyl Oct 29 '19 at 17:31
  • When you reduce the dimensions of the target, you have to keep the number of samples the same as the input x data (228 I think). This is why you are getting the other error. You might need to select say the last time step value of the current target to make the reduction. I believe something like y_target_new = np.squeeze(y_train[:,-1,:]) or something similar. I hope this helps. – ad2004 Oct 29 '19 at 17:40
  • OMG it works!! Thank you so much! :D btw, can you tell me what does np.squeeze(y_train[:,-1,:]) do? If I print the shape, it only printed (228, 26). Thank you once again :D – devyl Oct 29 '19 at 20:06