Recently I've been fiddling around with tensorflow 2.0. I had not used it in 2-3 years, but I know that I can use my GPU. I had most recently worked with PyTorch and when comparing my computer vs someone else that didn't have a GPU and was using Colab, it was night and day. My machine was so much faster.
But for some reason, as I have been running some small tests, I felt like the training was going too slowly. And then when I checked the speed by switching devices to CPU, the speed was the same.
Some context about my machine. I set up a new conda environment to mirror the one for the Tensorflow Developer Exam. I'm running TF 2.9.0 with Python 3.8.0 on a GeForce RTX 2060. I'm also running this on Windows 10. I did not re-download and update my CUDA libraries from a few years ago, but I checked and tensorflow recognizes my GPU.
Here is the code for loading tensorflow and doing GPU checking
import tensorflow as tf
print(tf.__version__)
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
device_lib.list_local_devices()
And the result
2.9.0
Num GPUs Available: 1
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 18213716215175288244
xla_global_id: -1,
name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4160159744
locality {
bus_id: 1
links {
}
}
incarnation: 14308843300195357737
physical_device_desc: "device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5"
xla_global_id: 416903419]
As you can see, the graphics card is being recognized. I did a basic NN regression test based on some youtube videos I have been watching. It is insurance data and it's pretty small. Only about 1000 training samples and 11 features after transformation. It's all simple numbers. No images or anything complicated. A very simple regression test.
Here is the data download and initial transformation
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
# Read in the insurance data
insurance = pd.read_csv('https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv')
rom sklearn.compose import make_column_transformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
# Create a column transformer
ct = make_column_transformer(
(MinMaxScaler(), ['age', 'bmi', 'children']),
(OneHotEncoder(), ['sex', 'smoker', 'region'])
)
# Create X and y
X = insurance.drop("charges", axis=1)
y = insurance.charges
# Build train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit the column transformer to the training data
ct.fit(X_train)
# Transform training and test data with normalization and one hot encoding
X_train_trans = ct.transform(X_train)
X_test_trans = ct.transform(X_test)
And here is how I made my neural network. Again, I kept it very simple. Here I used the functional API since I'm trying to learn it, but the speed ends up being the same with the Sequential API.
tf.random.set_seed(42)
inputs = tf.keras.Input(shape=X_train_trans[1].shape)
x = tf.keras.layers.Dense(128, activation='relu')(inputs)
x = tf.keras.layers.Dense(64, activation='relu')(x)
outputs = tf.keras.layers.Dense(1)(x)
ins_model_4 = tf.keras.Model(inputs, outputs)
ins_model_4.compile(loss='mae',
optimizer='adam',
metrics=['mae'])
history = ins_model_4.fit(X_train_trans, y_train, epochs=200, verbose=1)
As you can see, it's a very shallow model. But for some reason it took 30 seconds to train. That felt too long. It should be blazing fast. So I then ran this with tf.device selected for both cpu and gpu like this.
# with gpu selected
with tf.device('/gpu:0'):
history = ins_model_4.fit(X_train_trans, y_train, epochs=200, verbose=1)
# with cpu selected
with tf.device('/cpu:0'):
history = ins_model_4.fit(X_train_trans, y_train, epochs=200, verbose=1)
And I found the results are the same. What is going on here? I have a few guesses.
Do I need to download the new CUDA files? Is it possible for TF to recognize a gpu but not utilize it? Is there something about the data I am using or the regression problem I have defined that a re for some reason slow on my network? Did I code the tf.device stuff wrong? I could really use some help resolving this situation.