tensorflow estimator evaluate much slower than training

Question

I have a custom estimator and am trying to use some custom metrics during evaluation. However, whenever I add these metrics to evaluation, via eval_metric_ops the evaluation becomes really slow (much slower than training which is actually calculating the same metrics). If I don't add the metrics there then I can only see metrics in Tensorboard for training and not for evaluation.

What is the right way to add a custom metric for a custom estimator so that it is saved during evaluation.

This is what I have:

def compute_accuracy(preds, labels):
    total = tf.shape(labels.values)[0]
    preds = tf.sparse_to_dense(preds.indices, preds.dense_shape, preds.values, default_value=-1)
    labels = tf.sparse_to_dense(labels.indices, labels.dense_shape, labels.values, default_value=-2)

    r = tf.shape(labels)[0]
    c = tf.minimum(tf.shape(labels)[1], tf.shape(preds)[1])
    preds = tf.slice(preds, [0,0], [r,c])
    labels = tf.slice(labels, [0,0], [r,c])

    preds = tf.cast(preds, tf.int32)
    labels = tf.cast(labels, tf.int32)

    correct = tf.reduce_sum(tf.cast(tf.equal(preds, labels), tf.int32))
    accuracy = tf.divide(correct, total)
    return accuracy

In model_fn
    edit_dist = tf.reduce_mean(tf.edit_distance(tf.cast(predicted_label[0], tf.int32), labels))
    accuracy = compute_accuracy(predicted_label[0], labels)
    tf.summary.scalar('edit_dist', edit_dist)
    tf.summary.scalar('accuracy', accuracy)

    metrics = {
        'accuracy': tf.metrics.mean(accuracy),
        'edit_dist':tf.metrics.mean(edit_dist),
    }

   if mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics)

As requested, here is the complete model and TfRecord Writer code:

def crnn_model(features, labels, mode, params):

    inputs = features['image']
    print("INPUTS SHAPE", inputs.shape)

    if mode == tf.estimator.ModeKeys.TRAIN:
        batch_size = params['batch_size']
        lr_initial = params['lr']
        lr = tf.train.exponential_decay(lr_initial, global_step=tf.train.get_global_step(),
                                        decay_steps=params['lr_decay_steps'], decay_rate=params['lr_decay_rate'],
                                        staircase=True)
        tf.summary.scalar('lr', lr)
    else:
        batch_size = params['test_batch_size']

    with tf.variable_scope('crnn', reuse=False):
        rnn_output, predicted_label, logits = CRNN(inputs, hidden_size=params['hidden_size'], batch_size=batch_size)

    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = {
            'predicted_label': predicted_label,
            'logits': logits,
        }
        return tf.estimator.EstimatorSpec(mode, predictions=predictions)


    loss = tf.reduce_mean(tf.nn.ctc_loss(labels=labels, inputs=rnn_output,
                                         sequence_length=23 * np.ones(batch_size),
                                         ignore_longer_outputs_than_inputs=True))
    edit_dist = tf.reduce_mean(tf.edit_distance(tf.cast(predicted_label[0], tf.int32), labels))
    accuracy = compute_accuracy(predicted_label[0], labels)

    metrics = {
        'accuracy': tf.metrics.mean(accuracy),
        'edit_dist':tf.metrics.mean(edit_dist),
    }

    tf.summary.scalar('loss', loss)
    tf.summary.scalar('edit_dist', edit_dist)
    tf.summary.scalar('accuracy', accuracy)

    if mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics)

    assert mode == tf.estimator.ModeKeys.TRAIN

    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        optimizer = tf.train.AdadeltaOptimizer(learning_rate=lr)
        train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

Tf Record Writer code

def _write_fn(self, out_file, image_list, label_list, mode):
    writer = tf.python_io.TFRecordWriter(out_file)
    N = len(image_list)
    for i in range(N):
        if (i % 1000) == 0:
            print('%s Data: %d/%d records saved' % (mode, i,N))
            sys.stdout.flush()

        try:
            #print('Try image: ', image_list[i])
            image = load_image(image_list[i])
        except (ValueError, AttributeError):
            print('Ignoring image: ', image_list[i])
            continue
        label = label_list[i]
        feature = {
            'label': _int64_feature(label),
            'image': _byte_feature(tf.compat.as_bytes(image.tostring()))
        }

        example = tf.train.Example(features=tf.train.Features(feature=feature))

        writer.write(example.SerializeToString())
    writer.close()

I guess you solved your problem in the mean time, but I edited my answer for anyone with similar symptoms — Ciprian Tomoiagă, Jul 30 '18 at 12:27

Ciprian Tomoiagă · Answer 1 · 2018-07-30T12:26:17.523

In the Estimator framework, everything happens in the model_fn, namely your crnn_model(features, labels, mode, params). This is why this function has such a complex signature.

The mode parameter indicates whether it is called for training, evaluation or prediction. So, if you want to log additional summaries to tensorboard during the evaluation, you would add them under the if mode == tf.estimator.ModeKeys.EVAL section, or outside any if in the model_fn.

~~I suppose your eval is much slower because you have different batch sizes for train/eval and the eval batch size could be smaller.~~ You indicated this is not the case.

After a closer look at your code, and having experienced with a similar model, I believe that the evaluation takes longer with metrics because one of the metrics is edit_distance() which is implemented sequentially on the CPU. During training, this op is not required so it is not run.

What I suggest is that you run your train() and evaluate() in different programs, with the same model_fn() and model_dir. This way, train does not need to wait for evaluate. And evaluate will run only when necessary, i.e. when there are new checkpoints in the model_dir. If you don't have 2 GPUs for this, you can either split the GPU memory between the two processes (using a custom run-config with gpu_memory_fraction=0.75 for train) or by hiding the GPU from evaluate() with CUDA_VISIBLE_DEVICES='' environment variable

The code for metrics passed to eval_metrics_ops are just above it metrics = { 'accuracy': tf.metrics.mean(accuracy), 'edit_dist':tf.metrics.mean(edit_dist), } The evaluation takes longer than training when using metrics even with same batch size for both training and evaluation. — Annie, Jun 06 '18 at 19:09
does the evaluate support distributed itself? e.g using multiple nodes to do evaluate — crafet, Sep 26 '18 at 13:10
@crafet I don't know. Consider asking a new question, as yours is not related to this one — Ciprian Tomoiagă, Sep 27 '18 at 06:34

tensorflow estimator evaluate much slower than training

1 Answers1