(TensorFlow) TimeDistributed layer for image classification

Question

I know that “Time Distributed” layers are used when we have several images that are chronologically ordered to detect movements, actions, directions etc. However, I work on speech classification using spectrograms. Every speech is transformed into a spectrogram, which will be fed later to a neural network to perform classification. So my database is in the form of 2093 RGB images(100x100x3). For now I have used a CNN and the input is

x_train = np.array(x_train).reshape(2093,100,100, 3)

And every thing works just fine.

But now, I would like to use CNN+BLSTM (similar to the following picture, which is taken from this paper) , which means I am going to need time steps. So, every image should be divided into smaller frames.

The question is, how to prepare the data to do such a thing ?

Assuming that I want to divide every image into 10 frames (time steps). Should I just reshape the data

x_train = np.array(x_train).reshape(2093,10,10,100, 3)

Which works just fine but I'm not sure if it's the right thing , or there is another way to do that ?

This is the model that I'm using

model = tf.keras.Sequential([
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(100,100,3),name="conv1")),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),

tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=128, kernel_size=2, padding='same', activation='relu')),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=256, kernel_size=2, padding='same', activation='relu')),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),

tf.keras.layers.TimeDistributed(tf.keras.layers.Flatten()),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(200, activation="relu"), 
tf.keras.layers.Dense(10, activation= "softmax") 
])

By using the previous model, I got 47% on train accuracy and 46% accuracy on validation accuracy, but with using only CNN I got 95% on train and 71% on validation, could anyone give me a hint how to solve this problem ?

(TensorFlow) TimeDistributed layer for image classification

0 Answers0