Out of memory OOM using tensorflow gradient tape but only happens when I append a list

Question

I've been working on a data set (1000,3253) using a CNN. I'm running gradient calculations through gradient tape but it keeps running out of memory. Yet if I remove the line appending a gradient calculation to a list the script runs through all the epochs. I'm not entirely sure why this would happen but I am also new to tensorflow and the use of gradient tape. Any advice or input would be appreciated

        #create a batch loop
    for x, y_true in train_dataset:            
        #create a tape to record actions


        with  tf.GradientTape(watch_accessed_variables=False) as tape:
            x_var = tf.Variable(x)
            tape.watch([model.trainable_variables,x_var])    

            y_pred = model(x_var,training=True)    
            tape.stop_recording()
            loss = los_func(y_true, y_pred)
        epoch_loss_avg.update_state(loss)
        epoch_accuracy.update_state(y_true, y_pred)                

        #pdb.set_trace() 
        gradients,something = tape.gradient(loss, (model.trainable_variables,x_var))
        #sa_input.append(tape.gradient(loss, x_var))
        del tape            


        #apply gradients
        sa_input.append(something)
        opti_func.apply_gradients(zip(gradients, model.trainable_variables)) 
    train_loss_results.append(epoch_loss_avg.result())
    train_accuracy_results.append(epoch_accuracy.result())

Can you please share the model.summary? Did you try reducing the number of trainable params of the model and see if the issue is fixed? May be by adding max pooling layers and reduce dense layers to reduce number of trainable params. Also can you share the complete code or a reproducible code as Google colab link? — , May 18 '20 at 10:02
It seems like Tensorflow really wants you to use their built in functions. I found that instead of appending to a list use tensorflows concatenate function. — nauge, May 26 '20 at 15:37
Append to the list also works, which we have tested. Not sure OOM has anything to do with list. So did the tensorflows concatenate function fix your issue. Can you please share the code. — , May 26 '20 at 15:45

score 1 · Accepted Answer · answered Jun 09 '20 at 15:04

As you are new to TF2, would recommend to go through this guide. This guide covers training, evaluation, and prediction (inference) models in TensorFlow 2.0 in two broad situations:

When using built-in APIs for training & validation (such as model.fit(), model.evaluate(), model.predict()). This is covered in the section "Using built-in training & evaluation loops".
When writing custom loops from scratch using eager execution and the GradientTape object. This is covered in the section "Writing your own training & evaluation loops from scratch".

Below is a program where I am computing the gradients after every epoch and appending to a list. At end of the program I am converting the list to array for simplicity.

Code - This program throws OOM Error error if I use a deep network of many layers and bigger filter size

# Importing dependency
%tensorflow_version 2.x
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras import datasets
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
import numpy as np
import tensorflow as tf

# Import Data
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Build Model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32,32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10))

# Model Summary
model.summary()

# Model Compile 
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Define the Gradient Function
@tf.function
def get_gradient_func(model):
    with tf.GradientTape() as tape:
       logits = model(train_images, training=True)
       loss = loss_fn(train_labels, logits)    
    grad = tape.gradient(loss, model.trainable_weights)
    model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
    return grad

# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    grad = get_gradient_func(model)
    epoch_gradient.append(grad)

epoch = 4

print(train_images.shape, train_labels.shape)

model.fit(train_images, train_labels, epochs=epoch, validation_data=(test_images, test_labels), callbacks=[GradientCalcCallback()])

# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epoch)

Output -

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_12 (Conv2D)           (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 4, 4, 64)          36928     
_________________________________________________________________
flatten_4 (Flatten)          (None, 1024)              0         
_________________________________________________________________
dense_11 (Dense)             (None, 64)                65600     
_________________________________________________________________
dense_12 (Dense)             (None, 10)                650       
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
(50000, 32, 32, 3) (50000, 1)
Epoch 1/4
1563/1563 [==============================] - 109s 70ms/step - loss: 1.7026 - accuracy: 0.4081 - val_loss: 1.4490 - val_accuracy: 0.4861
Epoch 2/4
1563/1563 [==============================] - 145s 93ms/step - loss: 1.2657 - accuracy: 0.5506 - val_loss: 1.2076 - val_accuracy: 0.5752
Epoch 3/4
1563/1563 [==============================] - 151s 96ms/step - loss: 1.1103 - accuracy: 0.6097 - val_loss: 1.1122 - val_accuracy: 0.6127
Epoch 4/4
1563/1563 [==============================] - 152s 97ms/step - loss: 1.0075 - accuracy: 0.6475 - val_loss: 1.0508 - val_accuracy: 0.6371
Total number of epochs run: 4

Hope this answers your question. Happy Learning.

@nauge - Hope we have answered your question. Can you please accept and upvote the answer if you are satisfied with the answer. — , Jun 10 '20 at 06:37

Out of memory OOM using tensorflow gradient tape but only happens when I append a list

1 Answers1

Linked