I’m working with tensorflow and I want to speed up the prediction phase of a pre-trained Keras model (I'm not interested in the training phase) by using simultaneously the CPU and one GPU.
I tried to create 2 different threads that feed two different tensorflow sessions (one that runs on CPU and the other that runs on GPU). Each thread feeds a fixed number of batches (e.g. if we have an overall of 100 batches, I want to assign 20 batches for CPU and 80 on GPU, or any possible combination of the two) in a loop and combine the result. It would be better if the split was done automatically.
However even in this scenario, it seems that the batches are being fed in a synchronous way, because even sending few batches to the CPU and computing all the others in the GPU (with the GPU as bottleneck) I observed that the overall prediction time is always higher with respect to the test made only using the GPU.
I would expect it to be faster because when only the GPU is working the CPU usage is about 20-30%, thus there is some CPU available to speed up the computation.
I read a lot of discussions but they all deal with parallelism with multiple GPUs and not between GPU and CPU.
Here is a sample of the code I have written: the tensor_cpu
and tensor_gpu
objects are loaded from the same Keras model in this way:
with tf.device('/gpu:0'):
model_gpu = load_model('model1.h5')
tensor_gpu = model_gpu(x)
with tf.device('/cpu:0'):
model_cpu = load_model('model1.h5')
tensor_cpu = model_cpu(x)
Then the prediction is done as following:
def predict_on_device(session, predict_tensor, batches):
for batch in batches:
session.run(predict_tensor, feed_dict={x: batch})
def split_cpu_gpu(batches, num_batches_cpu, tensor_cpu, tensor_gpu):
session1 = tf.Session(config=tf.ConfigProto(log_device_placement=True))
session1.run(tf.global_variables_initializer())
session2 = tf.Session(config=tf.ConfigProto(log_device_placement=True))
session2.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
t_cpu = Thread(target=predict_on_device, args=(session1, tensor_cpu, batches[:num_batches_cpu]))
t_gpu = Thread(target=predict_on_device, args=(session2, tensor_gpu, batches[num_batches_cpu:]))
t_cpu.start()
t_gpu.start()
coord.join([t_cpu, t_gpu])
session1.close()
session2.close()
How can I achieve this CPU/GPU parallelization? I think I'm missing something.
Any kind of help would be very appreciated!