What's the best way to block on a GPU operation in TensorFlow's Eager mode?

Question

I would like to know the recommended way to wait for a GPU operation to complete in TensorFlow Eager mode.

Operations that are located on a GPU device appear to execute asynchronously (I could not find this in the TensorFlow documentation, but it's consistent with behavior). This is important, for example, when timing GPU ops using time.time()*, since we need to make sure the ops are completed before logging the end time.

The only way I could find to ensure a GPU operation has been executed is to explicitly copy (some of) the output data to the CPU.

For example (assuming all operations are carried out on the GPU):

t0 = time.time()
result = f(input_tensor)  # carry out some operations on the input
_ = result[0].numpy()  # copies a single element of the output tensor to the CPU
t1 = time.time()
print("runtime =", t1 - t0)

Since copying data to the CPU incurs some overhead, it would be nice to have a way to ensure the GPU has finished executing without copying. Is there such a way? Perhaps something like JAX's block_until_ready()?

*I realize that using time.time() may not be the best way to time GPU operations in Eager mode.

Would returning just a single element result[0][0,0,0...] work? — y.selivonchyk, Jun 19 '19 at 04:56
This is what I meant by "copies a single element". Have edited to clarify. I guess I don't know how much overhead a single-element copy incurs. I'm assuming there are some setup costs for the copy, so I'm specifically looking for a solution that copies no data at all. — thatistosay, Jun 20 '19 at 19:58
I think you can just compute `tf.identity(your_op)` Now the identity operation will depend on `your_op`, so compute it and `your_op` must be computed. — David Parks, Jun 20 '19 at 21:20
@DavidParks Just tried that, but it doesn't work. Presumably the identity op gets "executed" on the GPU too, so this is still asynchronous. — thatistosay, Jun 21 '19 at 20:39
You might want to check out the tensorflow profiler breakdown: https://towardsdatascience.com/howto-profile-tensorflow-1a49fb18073d — David Parks, Jun 22 '19 at 01:15
@DavidParks That's a handy blog post, but it doesn't explain how to profile in Eager mode. See [my other question](https://stackoverflow.com/q/56693208/11666760)! — thatistosay, Jul 18 '19 at 13:48

What's the best way to block on a GPU operation in TensorFlow's Eager mode?

0 Answers0