0

I'm trying to figure out the reasoning behind the slow performance I get while training when using a GPU runtime.

Data: Ragged tensors of 11 feature. shape:(49724, None, 11).(Download dataset from Dropbox)
Targets: Each sample results with 3 targets ranged between 0-1. shape:(49724, 3).(Download targets from Dropbox)

The model is an LSTM network with an input layer using ragged=True attribute. Code:

config = {
    'learning_rate': 0.001,
    'lstm_neurons':32,
    'lstm_activation':'tanh',
    'dropout_rate': 0.08,
    'batch_size': 128,
    'dense_layers':[
      {'neurons': 32, 'activation': 'relu'},
      {'neurons': 32, 'activation': 'relu'},
    ]
}

def get_model(num_features, output_size):
    opt = Adam(learning_rate=0.001)
    model = Sequential()
    model.add(Input(shape=[None,num_features], dtype=tf.float32, ragged=True))
    model.add(LSTM(config['lstm_neurons'], activation=config['lstm_activation']))
    model.add(BatchNormalization()) 
    if 'dropout_rate' in config:
      model.add(Dropout(config['dropout_rate']))

    for layer in config['dense_layers']:
      model.add(Dense(layer['neurons'], activation=layer['activation']))
      model.add(BatchNormalization()) 
      if 'dropout_rate' in layer:
        model.add(Dropout(layer['dropout_rate']))

    model.add(Dense(output_size, activation='sigmoid'))
    model.compile(loss='mse', optimizer=opt, metrics=['mse'])
    print(model.summary())
    return model

model = get_model(11 ,3)

I've created 2 Google Colab notebooks to demonstrate the issue. One is a GPU runtime while the other is a CPU runtime.

GPU Colab
CPU Colab

On the GPU runtime, 1 epoch takes 869s, while on the CPU runtime it takes 252s!
My question is why and can I do something about it?

Shlomi Schwartz
  • 8,693
  • 29
  • 109
  • 186
  • In GPU colab, I see that you ran your model in just one epoch. Have you checked by running a few more epochs? – Innat Apr 11 '21 at 12:47
  • Yes I did, same results. CPU is much faster. – Shlomi Schwartz Apr 11 '21 at 12:57
  • 3
    GPU is not always faster, specially when small models are involved (like yours). – Dr. Snoopy Apr 11 '21 at 14:16
  • @ShlomiSchwartz have you checked this: [Training a simple model in Tensorflow GPU slower than CPU](https://stackoverflow.com/questions/55749899/training-a-simple-model-in-tensorflow-gpu-slower-than-cpu) – Innat Apr 11 '21 at 14:27
  • 1
    @ShlomiSchwartz if you change your batch size to 512, you wound see for CPU per epoch takes 283s where GPU per epoch takes 208s. – Innat Apr 11 '21 at 18:12

0 Answers0