1

I am trying to write a pretty complicated neural network (at least for me) in keras that needs to combine both a common CNN structure and an LSTM/GRU layer.

Basically, I have a dataset of climatological maps of the Mediterranean sea, each map details the wind, pressure and other parameters. I am studying Medicanes (Mediterranean hurricanes) and my goal is to create a neural network that can classify each map with a label zero if there is no trace of such hurricanes or one if the map contains one.

In order to achieve that I need a network with two parts:

  1. feature extractor (normal CNN).
  2. temporal layer (LSTM/GRU).

The main cause of this is that each map is correlated with the previous one because the formation and life cycle of a Medicane can take several days to complete.

Important note: the dataset is too big to be uploaded all at once so I have to work one batch at a time.


I am working with Keras and I found it pretty challenging to adapt its standard framework to my needs so I have come up with some peculiar flow to feed my data into the network.

In particular, I found it hard to pass both my batch size and my time-step parameter to the GRU layer using a more standard alternative.

This is what I tried:

I am positively sure I have overcomplicated the task, but, as I said I am not very proficient with Keras and TensorFlow.

The main problem was that I could not find a way to import the data both in a batch (for RAM reasons) and in a sequence of 10-15 pictures (to be used as the time steps in the GRU layer).

I solved this problem by importing batches of 120 maps in order (no shuffle) and I created a way to turn these batches into the sequence of images I needed then I proceeded to re-batch the sequences and feed them to the model manually.


Data Import

batch_size=120

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    "./Figures_1/Train",
    validation_split=None,
    subset=None,
    labels="inferred",
    label_mode="binary",
    color_mode="rgb",
    interpolation='bilinear',
    batch_size=batch_size,
    image_size=(600, 600),
    shuffle=False,
    seed=123
)

Get a sequence of Images

Here, I break down the 120 map batches into sequences of 60 observations, and I return each sequence one at a time.

sequence_lengh=60

def sequence_x(train_dataset):
    
    x_numpy = np.asarray(list(map(lambda x: x[0], tfds.as_numpy(train_dataset))),dtype=object)
    
    for element in range(0,x_numpy.shape[0]):
        for i in range(0, x_numpy.shape[0],sequence_lengh):
            x_seq = x_numpy[element][i:i+sequence_lengh]
            yield x_seq
        
def sequence_y(train_dataset):
    
    y_numpy = np.asarray(list(map(lambda x: x[1], tfds.as_numpy(train_dataset))),dtype=object)
    
    for element in range(0,y_numpy.shape[0]):
        for i in range(0, y_numpy.shape[0],sequence_lengh):
            y_seq = y_numpy[element][i:i+sequence_lengh]
            yield y_seq

CNN Model

I build the CNN model based on a pre-trained DenseNet

from keras.layers import TimeDistributed, GRU

def build_convnet(shape=(600, 600, 3)):
    
    inputs = keras.Input(shape = shape)
    x = inputs

    # preprocessing
    x = keras.applications.densenet.preprocess_input(x)

    #Convbase
    x = convBase(x)
    x = layers.Flatten()(x)

    # Fine tuning
    x = keras.layers.Dense(1024, activation='relu')(x)
    x = layers.Dropout(0.2)(x)
    x = keras.layers.Dense(512, activation='relu')(x)
    x = keras.layers.GlobalMaxPool2D()
    
    return x

GRU Model

I build the time part of the network with a GRU layer

def action_model(shape=(15, 600, 600, 3), nbout=15):
    # Create our convnet with (112, 112, 3) input shape
    convnet = build_convnet(shape[1:]) #[1:]
    
    # then create our final model
    model = keras.Sequential()
    # add the convnet with (5, 112, 112, 3) shape
    model.add(TimeDistributed(convnet, input_shape=shape))
    # here, you can also use GRU or LSTM
    model.add(GRU(64))
    # and finally, we make a decision network
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(.5))
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(.5))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(.5))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(15, activation='softmax'))
    return model

Transfer Learning

I retrain a part of the GRU

convBase = DenseNet121(include_top=False, weights=None, input_shape=(600,600,3), pooling="avg")

for layer in convBase.layers: 
  if 'conv5' in layer.name:
    layer.trainable = True
for layer in convBase.layers: 
  if 'conv4' in layer.name:
    layer.trainable = True

Model Compile

Model compilation ( image size= 600x600x3)

INSHAPE=(15, 600, 600, 3) # (5, 112, 112, 3)
model = action_model(INSHAPE, 1)
optimizer = keras.optimizers.Adam(0.001)

model.compile(
    optimizer,
    'categorical_crossentropy',
    metrics='accuracy'
)

Model Fit

Here I manually batch my data. I turn an array (60, 600, 600, 3) into a (4,15,600,600) array. Meaning 4 batches each one containing a 15-map long sequence.


epochs = 10

for value in range(0, epochs):
    
    train_x, train_y = sequence_x(train_ds), sequence_y(train_ds)
    val_x, val_y = sequence_x(validation_ds), sequence_y(validation_ds)
    
    for i in range(0,278): #
        
        x = next(train_x, "none")
        y = next(train_y, "none")
        
        if (x!="none" or y!="none"):

            if (np.any(x) and np.any(y)):

                x_stack = np.stack((x[:15], x[15:30], x[30:45], x[45:]))
                y_stack = np.stack((y[:15], y[15:30], y[30:45], y[45:]))
                y_stack=y_stack.reshape(4,15)

                model.fit(x=x_stack, y=y_stack, 
                            validation_data=None, 
                            batch_size=None,
                            shuffle=False
                            )

            else:
                continue
        else:
            continue

The idea is to get a model that, when presented with a sequence of images, can categorize each one of them with a 0 or a 1 if they have a Medicane or not.


The model does compile without any errors but the results it provides are horrible:

(Image 1).

What am I doing incorrectly? Is there a more effective way to write all of this?

Ahmad Othman
  • 853
  • 5
  • 18
  • Everything seems fine to me until the training part. I can see in the screenshot that each epoch contains only a single batch. You shouldn't call ``model.fit`` for each batch. It is designed to iterate over a dataset, which provides the training with batches of data. I suggest you wrap the sequence creation inside a ``tf.data.Dataset`` and call the ``model.fit`` just once. You could also benefit from doing the sequence preparation in Tensorflow so that you don't convert the data from Tensorflow to Numpy and back, creating unnecessary processing costs. – Ladislav Ondris Jan 13 '23 at 22:50
  • So I tried to follow this route using a 'tf.data.Dataset.from_generator' but, as I said, I can not load the entire dataset in my RAM. So basically the idea is good but I have a RAM constrain that does not let me create this kind of dataset. If I am wrong please show me how I am suppose to do it without having the entire dataset loaded in my RAM. – Finest Whiskey Jan 15 '23 at 17:29
  • Not having to load your entire dataset into RAM is exactly what the ``tf.data.Dataset`` is for. The following article should help you get started: https://www.tensorflow.org/guide/data. I suggest you also read the optimization article https://www.tensorflow.org/guide/data_performance, which will help you understand how it works under the hood. – Ladislav Ondris Jan 16 '23 at 08:08
  • Practically speaking, if you have a dataset of images, you don't want them all loaded in RAM. You load them batch by batch only when the training loop requests them. Instead, you start by providing the tf.data.Dataset with the image paths to all the images, set up a ``tf.data.Dataset.map`` function that reads the image, reshapes it, etc. And then, create a batch using the ``tf.data.Dataset.batch`` function. If you struggle to develop a pipeline for your use case, I recommend experimenting with the simple ones from the Tensorflow tutorials first. – Ladislav Ondris Jan 16 '23 at 08:13

0 Answers0