I installed Cuda and cuDNN as per instructions on TF help page and it appears that everything is working correectly. If I print the available GPUs I get:
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Out: Num GPUs Available: 1
Also when I start training the sequential model in the output I get that all necessary libraries have loded correctly and that a GPU device successfully created:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4733 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
But I'm not seeing any major improvements in training performance. It's about the same as it was before when training on the CPU and I'd assume that my RTX 3060 should provide a bit of a boost.
Should I be seeing an improvement when training a relatively simple Sequential model?
EDIT: If I disable GPU training and train on CPU only using:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
The training time of the model on CPU is 21.14 seconds, on GPU the training takes 57.59(!!!) seconds.
I also don't see GPU load increase as expected during training:
Also the code for the model I'm training:
import datetime as dt
# import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
from tensorflow import keras
import numpy as np
EPOCHS = 50
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 10 # Number of outputs
N_HIDDEN = 128
VALIDATION_SPLIT = 0.2
DROPOUT = 0.3
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# X_train is 60,000 rows of 28x28 values
# Reshape it to 60,000x784
RESHAPED = 784
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
X_train = X_train.reshape(60000, RESHAPED)
X_test = X_test.reshape(10000, RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# Normalize inputs between 0 and 1
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# One-hot encoding of labels
Y_train = tf.keras.utils.to_categorical(Y_train, NB_CLASSES)
Y_test = tf.keras.utils.to_categorical(Y_test, NB_CLASSES)
# Build the model
model = tf.keras.models.Sequential()
model.add(keras.layers.Dense(N_HIDDEN, input_shape=(RESHAPED,),
name='dense_layer', activation='relu'))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(N_HIDDEN, input_shape=(RESHAPED,),
name='dense_layer2', activation='relu'))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(NB_CLASSES, input_shape=(RESHAPED,),
name='dense_layer3', activation='softmax'))
# Print summary of the model
model.summary()
# Compiling the model
model.compile(optimizer='Adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
t = dt.datetime.now()
# Training the model
model.fit(X_train, Y_train, batch_size=BATCH_SIZE,
epochs=EPOCHS, verbose=VERBOSE,
validation_split=VALIDATION_SPLIT)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, Y_test)
print('Test accuracy: ', test_acc)
print(f'Training elapsed: {dt.datetime.now()-t}')