In the following code, it is absolutely imperative for me to execute the complete function in GPU without a single jump back to CPU. This is because I have 4 CPU cores but I have 1200 cuda cores. Theoretically, it is possible because the tensorflow feed_forwards, if statements and and the variable assigns can be done on GPU (I have NVIDIA GTX 1060).
The problem I'm facing is tensorflow2.0 does this automatic assignment to GPU and CPU in the backend and doesn't mention which of it's ops are GPU compatible. When I run the following function with device as GPU, I get
parallel_func could not be transformed and will be staged without change.
and it runs sequentially on GPU.
My question is where to use tf.device? What part of code will be converted by autograph to GPU code and what will remain on CPU? How can I convert that too to GPU?
@tf.function
def parallel_func(self):
for i in tf.range(114): #want this parallel on GPU
for count in range(320): #want this sequential on GPU
retrivedValue = self.data[i][count]
if self.var[i]==1:
self.value[i] = retrievedValue # assigns, if else
elif self.var[i]==-1: # some links to class data through
self.value[i] = -retrivedValue # self.data, self.a and self.b
state = tf.reshape(tf.Variable([self.a[i], self.b[i][count]]), [-1,2])
if self.workerSwitch == False:
action = tf.math.argmax(self.feed_forward(i, count, state))
else:
action = tf.math.argmax(self.worker_feed_forward(i, count, state))
if (action==1 or action==-1):
self.actionCount +=1