I am experimenting with reinforcement learning in python. I am using Tensorflow 2.1 and my machine has muyliple GPUs (with CUDA 10.2 driver 440.59). I am allocating the operations on my GPUs using tf.device(). I am not using the tf.distribute.Strategy. I am building my model with:
with tf.device(USE_GPU):
model = build_keras_Seq()
where build_keras_Seq() uses the functional AP to create the model:
model = tf.keras.Model(inputs=inputs, outputs=outputs)
All my inputs are tensors allocated on the same GPU as my model.
with tf.device(USE_GPU):
self.images_gpu = tf.zeros(shape=(1,IMG_HEIGHT,IMG_WIDTH), dtype=tf.int16) # (165, 160, 1)
self.actions_gpu = tf.zeros(shape=(1,), dtype=tf.int16)
self.rewards_gpu = tf.zeros(shape=(1,), dtype=tf.int16)
self.dones_gpu = tf.zeros(shape=(1,), dtype=tf.int16)
My target is calculated by a @tf.function which implement Expected SARSA and returns a tensor on the GPU:
target_train is on device: /job:localhost/replica:0/task:0/device:GPU:1
When I call model.fit, it seems that a lot of operations are executed on the CPU (see below) resulting in poor performances. My understanding is that the tensors are moved back to the CPU before being sent to the GPU. Any idea on how to prevent that behavior and to feed the tensors directly from the GPU to the model hosted on the same GPU?
2020-02-23 09:49:32.100259: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RangeDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-02-23 09:49:32.101114: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RepeatDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-02-23 09:49:32.108407: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op MapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-02-23 09:49:32.109087: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op PrefetchDataset in device /job:localhost/replica:0/task:0/device:GPU:1
2020-02-23 09:49:32.117795: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op FlatMapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-02-23 09:49:32.118524: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op TensorDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-02-23 09:49:32.119764: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RepeatDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-02-23 09:49:32.120133: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ZipDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-02-23 09:49:32.128411: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ParallelMapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-02-23 09:49:32.129839: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ModelDataset in device /job:localhost/replica:0/task:0/device:CPU:0
Thanks!
N