I would like to know the recommended way to wait for a GPU operation to complete in TensorFlow Eager mode.
Operations that are located on a GPU device appear to execute asynchronously (I could not find this in the TensorFlow documentation, but it's consistent with behavior). This is important, for example, when timing GPU ops using time.time()
*, since we need to make sure the ops are completed before logging the end time.
The only way I could find to ensure a GPU operation has been executed is to explicitly copy (some of) the output data to the CPU.
For example (assuming all operations are carried out on the GPU):
t0 = time.time()
result = f(input_tensor) # carry out some operations on the input
_ = result[0].numpy() # copies a single element of the output tensor to the CPU
t1 = time.time()
print("runtime =", t1 - t0)
Since copying data to the CPU incurs some overhead, it would be nice to have a way to ensure the GPU has finished executing without copying. Is there such a way? Perhaps something like JAX's block_until_ready()?
*I realize that using time.time()
may not be the best way to time GPU operations in Eager mode.