How to feed features extracted frames of videos in a LSTM?

Question

I want to do some anomaly detection based based on a thousand of videos. I have extracted the features of all frames of all videos (using VGG16). Now, I have everything in several files corresponding to each videos.

When I load a file from my disk, I get a np.ndarray of shape (nb_frames, 25088). The 25088 component corresponds to the output of VGGNet16 when flattened (VGG16 output: 1x7x7x512).

I want to feed a LSTM K frames by K frames. However, it has been days since I am trying but I am now desperate and cannot make it work...

self.model = Sequential()
# LSTM needs 3 dimensional data (nb_samples, timesteps, input_dim)
self.model.add(CuDNNLSTM(32, return_sequences=True, batch_input_shape=(BATCH_SIZE, SIZE_WINDOW, 25088)))
self.model.add(Dropout(0.2))
self.model.add(Dense(1, activation='softmax'))
self.model.compile(loss='binary_crossentropy', optimizer="rmsprop", metrics=['accuracy'])
self.model.summary()

for (X_train, y_train) in self.batch_generator():
    self.model.fit(X_train, y_train, epochs=10)

And here is my generator:

def batch_generator(self):
    # for all feature extracted files
    for video in self.videos:
        # videos[0] contains the path to the file
        # videos[1] contains the target (abnormal or not)
        x_train = np.load(video[0])  # load the video's features from disk

        nb_frames = x_train.shape[0]
        data = x_train.shape[1]

        # I've seen on stackoverflow I have to do that...
        x_train = x_train.reshape(nb_frames, data, 1)

        # The target is defined at video level, not frame level, then the same y is applied for all frame of
        # current video
        y_train = np.array([video[1]] * nb_frames)

        # the output shape (the output *shape* is 2 dimensional according to someone on stackoverflow)
        y_train = y_train.reshape(y_train.shape[0], 1)

        nb_windows = len(x_train) // SIZE_WINDOW

        for window_index in range(0, nb_windows):
            start = window_index * SIZE_WINDOW
            end = (window_index + 1) * SIZE_WINDOW
            yield x_train[start:end], y_train[start:end]

I get the error:

ValueError: Error when checking input: expected cu_dnnlstm_input 
to have shape (30, 25088) but got array with shape (25088, 1)

30 is the number of frame I want to process in the LSTM.

Also, whenever I try to change the order of the components, I get the same error but with different values...

Edit : Here is my code if I apply the first answer's solution. But it gives me a ValueError, cannot reshape:

        for window_index in range(0, nb_windows):
            start = window_index * SIZE_WINDOW
            end = (window_index + 1) * SIZE_WINDOW

            chunk = np.array(x_train[start:end])
            chunk = chunk.reshape(int(nb_frames / SIZE_WINDOW), SIZE_WINDOW, data)

            yield chunk, y_train[start:end]

Even if I do it here, the error remains:

        [...]
        # I've seen on stackoverflow I have to do that...
        # x_train = x_train.reshape(nb_frames, data, 1)
        x_train = x_train.reshape(int(nb_frames / SIZE_WINDOW), SIZE_WINDOW, data)
        [...]

Rifat Alptekin Çetin · Answer 1 · 2019-02-17T01:55:50.613

0

change reshape:

x_train = x_train[:len(x_train)-(x_train%SIZE_WINDOW)]
x_train = x_train.reshape(int(nb_frames/SIZE_WINDOW), SIZE_WINDOW, data)

sorry my bad

edited Feb 17 '19 at 01:55

answered Feb 16 '19 at 17:27

Rifat Alptekin Çetin

1,279
5
9

Unfortunately, it raises an error: `ValueError: cannot reshape array of size 68716032 into shape (91,30,25088)`. Whenever i put your reshape (just before the yield or before the for-loop), the error is the same. I have edited my question so you can see where I put your instruction. – Mourad Qqch Feb 16 '19 at 19:51
Maybe if I create a ndarray filled with zeros, then I put my data into that array, it will work...? I'll try that but I don't know if it will affect the learning process. – Mourad Qqch Feb 16 '19 at 20:01

How to feed features extracted frames of videos in a LSTM?

1 Answers1