Is it possible to do the whole training procedure in GPU with Tensorflow/Keras?

Question

If the dataset is small enough to fit in the GPU memory, is it possible with Tensorflow to allocate it all initially and then do the training without having data transfers between CPU and GPU? It seems to me that with tf.data this is not possible and the data transfer is not controlled by the programmer.

Analyzing the GPU workload during training, it reaches 75% with CIFAR10, but I would expect it to reach 100% being that the dataset fit in GPU memory.Also analyzing with Tensorboard I see that there are a lot of Send operations. (I saw that there is a similar question quite old here, however at that time there was no tf.data yet)

How can I achieve at least 90% GPU utilization with Tensorflow? Even putting the parameter verbose=0 in the fit method the GPU remains 21% of the time in idle. I had already read the guide you sent me however there is not what I am looking for; is the best way to send data to the GPU the one used by default by tf.data? @JirayuKaewprateep — pietrus, Dec 12 '22 at 16:37
Please ignore @JirayuKaewprateep as he often makes misleading answers/comments. Utilization does not depend on the dataset size or whatever it fits in RAM, it depends on the amount of computation, which depends on your model and batch size, maybe try increasing the batch size? — Dr. Snoopy, Dec 12 '22 at 19:51
Yes, if I increase the batch_size the GPU utilization increases, however I wanted to know how the internal data loading of tensorflow works. That is, when I use the fit method with on a tf.data.Dataset what happens? Is it transferred one batch at a time from RAM to GPU memory? I knew there was the max_queue_size parameter of fit, but changing it I don't notice anything different. Couldn't the whole dataset be stored entirely in GPU so that this data transfer is there only at the beginning? — pietrus, Dec 12 '22 at 20:04
Sure but it would only make a difference if the bottleneck is CPU-GPU transfer, which I do not think is the case. — Dr. Snoopy, Dec 13 '22 at 01:30
Actually I think in my case the bottleneck might be in this data transfer between CPU RAM and GPU memory. How can I do to transfer the entire dataset only initially to GPU with Tensorflow/Keras? @Dr.Snoopy — pietrus, Dec 13 '22 at 07:13
@pietrus You provided no evidence of a CPU-GPU bottleneck, what makes you think that? — Dr. Snoopy, Dec 14 '22 at 19:50
The thing is, I measured a whole epoch with TFProfiler and you can see that the GPU stays in IDLE 30% of the time. that's why I think it's a bottleneck due to CPU/GPU data transfer @Dr.Snoopy — pietrus, Dec 15 '22 at 13:29
There is al ways a bottleneck, but what you just said is not evidence of a CPU/GPU data transfer bottleneck, you are assuming this. The bottleneck can be somewhere else, or you are not putting enough computation on the GPU. — Dr. Snoopy, Dec 15 '22 at 14:29
Yes the evidence is not there, this is my assumption. The only way to increase the computation in the GPU is to increase the batch_size? @Dr.Snoopy — pietrus, Dec 15 '22 at 15:31

Is it possible to do the whole training procedure in GPU with Tensorflow/Keras?

0 Answers0