0

I have been working on a model whose training loop uses a tf.function wrapper (I get OOM errors when running eagerly), and training seems to be running fine. However, I am not able to access the tensor values returned by my custom training function (below)

def train_step(inputs, target):
    with tf.GradientTape() as tape:
        predictions = model(inputs, training=True)
        curr_loss = lovasz_softmax_flat(predictions, target)

    gradients = tape.gradient(curr_loss, model.trainable_variables)
    opt.apply_gradients(zip(gradients, model.trainable_variables))
    
    # Need to access this value
    return curr_loss

A simplified version of my 'umbrella' training loop is as follows:

@tf.function
def train_loop():
for epoch in range(EPOCHS):
        for tr_file in train_files:

            tr_inputs = preprocess(tr_file)
            
            tr_loss = train_step(tr_inputs, target)
            print(tr_loss.numpy())
            

When I do try to print out the loss value, I end up with the following error:

AttributeError: 'Tensor' object has no attribute 'numpy'

I also tried using tf.print() as follows:

tf.print("Loss: ", tr_loss, output_stream=sys.stdout)

But nothing seems to appear on the terminal. Any suggestions?

sbab94
  • 1

1 Answers1

0

You can't convert to Numpy array in graph mode. Just create a tf.metrics object outside of the function, and update it in the function.

mean_loss_values = tf.metrics.Mean()

def train_step(inputs, target):
    with tf.GradientTape() as tape:
        predictions = model(inputs, training=True)
        curr_loss = lovasz_softmax_flat(predictions, target)

    gradients = tape.gradient(curr_loss, model.trainable_variables)
    opt.apply_gradients(zip(gradients, model.trainable_variables))

    # look below
    mean_loss_values(curr_loss)
    # or mean_loss_values.update_state(curr_loss)
    
    # Need to access this value
    return curr_loss

Then later in your code:

mean_loss_values.result()
Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
  • Thanks for the help Nicolas. But for my case it is necessary for the metrics to be evaluated within the train_loop() function (tf.function), as the values are to be updated regularly by a tf.summary writer for Tensorboard evaluation. Also, due to the complexity of my model, I cannot run either the train_loop() or the train_step() functions eagerly. – sbab94 Jan 07 '21 at 10:29