I am trying to update the weight each epoch, but I am processing the data in batches. The problem is, to normalize the loss, I need to tape TensorFlow variables outside the training loop(to be tracked and normalized). But when I do this, the training time is Huge.
I think, it accumulates variables from all batches into the graph and calculates gradients at the end.
I have started tracking variables outside the for loop and inside the for loop and the later is faster than first. I am confused about why this happens because whatever I do, my model's trainable variables and loss remain the same.
# Very Slow
loss_value = 0
batches = 0
with tf.GradientTape() as tape:
for inputs, min_seq in zip(dataset, minutes_sequence):
temp_loss_value = my_loss_function(inputs, min_seq)
batches +=1
loss_value = loss_value + temp_loss_value
# The following line takes huge time.
grads = tape.gradient(loss_value, model.trainable_variables)
# Very Fast
loss_value = 0
batches = 0
for inputs, min_seq in zip(dataset, minutes_sequence):
with tf.GradientTape() as tape:
temp_loss_value = my_loss_function(inputs, min_seq)
batches +=1
loss_value = loss_value + temp_loss_value
# If I do the following line, the graph will break because this are out of tape's scope.
loss_value = loss_value / batches
# the following line takes huge time
grads = tape.gradient(loss_value, model.trainable_variables)
When I declare tf.GradientTape() inside the for loop, it is very fast but I outside It is slow.
P.S. - This is for a custom loss and the architecture contains just one hidden layer of size 10.
I want to know, the difference tf.GradientTape()'s position makes and how it should be used for per epoch weights updating in batched dataset.