Tensorflow dynamic_rnn training loss decreasing, validation loss increasing

Question

I am adding my RNN text classification model. I am using last state to classify text. Dataset is small I am using glove vector for embedding.

def rnn_inputs(FLAGS, input_data):
    with tf.variable_scope('rnn_inputs', reuse=True):
        W_input = tf.get_variable("W_input", [FLAGS.en_vocab_size, FLAGS.num_hidden_units])
    embeddings = tf.nn.embedding_lookup(W_input, input_data)
    return embeddings

    self.inputs_X = tf.placeholder(tf.int32, shape=[None, None, FLAGS.num_dim_input], name='inputs_X')
    self.targets_y = tf.placeholder(tf.float32, shape=[None, None], name='targets_y')
    self.dropout = tf.placeholder(tf.float32, name='dropout')
    self.seq_leng = tf.placeholder(tf.int32, shape=[None, ], name='seq_leng')

    with tf.name_scope("RNNcell"):
        stacked_cell = rnn_cell(FLAGS, self.dropout)

    with tf.name_scope("Inputs"):
        with tf.variable_scope('rnn_inputs'):
            W_input = tf.get_variable("W_input", [FLAGS.en_vocab_size, FLAGS.num_hidden_units], initializer=tf.truncated_normal_initializer(stddev=0.1))

        inputs = rnn_inputs(FLAGS, self.inputs_X)
        #initial_state = stacked_cell.zero_state(FLAGS.batch_size, tf.float32)

    with tf.name_scope("DynamicRnn"):
        # flat_inputs = tf.reshape(inputs, [FLAGS.batch_size, -1, FLAGS.num_hidden_units])
        flat_inputs = tf.transpose(tf.reshape(inputs, [-1, FLAGS.batch_size, FLAGS.num_hidden_units]), perm=[1, 0, 2])
        all_outputs, state = tf.nn.dynamic_rnn(cell=stacked_cell, inputs=flat_inputs, sequence_length=self.seq_leng, dtype=tf.float32)

        outputs = state[0]

    with tf.name_scope("Logits"):
        with tf.variable_scope('rnn_softmax'):
            W_softmax = tf.get_variable("W_softmax", [FLAGS.num_hidden_units, FLAGS.num_classes])
            b_softmax = tf.get_variable("b_softmax", [FLAGS.num_classes])

        logits = rnn_softmax(FLAGS, outputs)

        probabilities = tf.nn.softmax(logits, name="probabilities")
        self.accuracy = tf.equal(tf.argmax(self.targets_y,1), tf.argmax(logits,1))

    with tf.name_scope("Loss"):
        self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=self.targets_y))

    with tf.name_scope("Grad"):
        self.lr = tf.Variable(0.0, trainable=False)
        trainable_vars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.loss, trainable_vars), FLAGS.max_gradient_norm)
        optimizer = tf.train.AdamOptimizer(self.lr)
        self.train_optimizer = optimizer.apply_gradients(zip(grads, trainable_vars))

        sampling_outputs = all_outputs[0]

    sampling_logits = rnn_softmax(FLAGS, sampling_outputs)
    self.sampling_probabilities = tf.nn.softmax(sampling_logits)

Output printed

EPOCH 7 SUMMARY 40 STEP
Training loss 0.439
Training accuracy 0.247
----------------------
Validation loss 0.452
Validation accuracy 0.234
----------------------
Saving the model.

EPOCH 8 SUMMARY 45 STEP
Training loss 0.429
Training accuracy 0.281
----------------------
Validation loss 0.462
Validation accuracy 0.203
----------------------
Saving the model.

EPOCH 9 SUMMARY 50 STEP
Training loss 0.428
Training accuracy 0.268
----------------------
Validation loss 0.465
Validation accuracy 0.188
----------------------
Saving the model.

EPOCH 10 SUMMARY 55 STEP
Training loss 0.424
Training accuracy 0.284
----------------------
Validation loss 0.455
Validation accuracy 0.172
----------------------
Saving the model.

EPOCH 11 SUMMARY 60 STEP
Training loss 0.421
Training accuracy 0.305
----------------------
Validation loss 0.461
Validation accuracy 0.156
----------------------
Saving the model.

EPOCH 12 SUMMARY 65 STEP
Training loss 0.418
Training accuracy 0.299
----------------------
Validation loss 0.462
Validation accuracy 0.141
----------------------
Saving the model.

EPOCH 13 SUMMARY 70 STEP
Training loss 0.416
Training accuracy 0.286
----------------------
Validation loss 0.462
Validation accuracy 0.156
----------------------
Saving the model.

EPOCH 14 SUMMARY 75 STEP
Training loss 0.413
Training accuracy 0.323
----------------------
Validation loss 0.468
Validation accuracy 0.141
----------------------
Saving the model.

After 165 EPOCH

EPOCH 165 SUMMARY 830 STEP
Training loss 0.306
Training accuracy 0.544
----------------------
Validation loss 0.547
Validation accuracy 0.109
----------------------
Saving the model.

Sorry for being late, My question is how can training loss decrease but validation loss increasing — Wazy, Mar 23 '17 at 11:40

score 0 · Answer 1 · answered Mar 23 '17 at 11:55

0

If training loss goes down, but validation loss goes up, it is likely that you are running into the problem of overfitting. This means: Generally speaking it is not that hard for a machine learning algorithm to perform exceptionally well on the training set (i.e. training loss is very low). If the algorithm just memorizes the training data set, it will produce a perfect score.

The challenge in machine learning however is to devise a model that performs well on unseen data, i.e. data that was not presented to the algorithm during training. This is what your validation set represents. If a model performs well on unseen data, we say that it generalizes well. If a model performs only well on training data, we call this overfitting. A model that does not generalize well is essentially useless, as it did not learn anything about the underlying structure of the data but just memorized the training set. This is useless because a trained model will be used on new data and probably never data used during training.

So how can you prevent that:

Reduce your model's capacity, i.e. take a simpler model and see if this can still accomplish the task. A less powerful model is less susceptible to simply memorize the data. Cf. also Occam's razor.
Use regularization: use e.g. dropout regularization in your model or add e.g. L1 or L2 norm of your trainable parameters to your loss function.

To get more information about this, search online for regularization, overfitting, etc.

answered Mar 23 '17 at 11:55

kafman

2,862
1
29
51

When i removed this method `inputs = rnn_inputs(FLAGS, self.inputs_X)` i.e. I removed tf.nn.embedding_lookup. model trained well. I am already using glove vectors for embedding over that I was applying tf.nn.embedding_lookup, was that the issue, or in other words can we apply embedding_lookup on glove or wor2vec input – Wazy Mar 23 '17 at 12:59
what is `rnn_inputs` exactly? – kafman Mar 23 '17 at 15:41
Added `rnn_inputs` to the question – Wazy Mar 23 '17 at 15:48
1

It's difficult to say why it "trained well" after removing `rnn_inputs` (already, what does "train well" mean? The other model also "trained well" ...). By leaving out `rnn_inputs` you are changing a big part of your architecture, so it is natural that training behaves differently. Also, without `rnn_inputs` you have less parameters in the model, i.e. less capacity, i.e. less chances of running into overfitting. I'm not saying this proves my answer, just some food for thought. I'm sorry I can't give you a clearer answer. – kafman Mar 23 '17 at 15:54
Thank kaufmanu. By trained well I meant to say giving better accuracy. I understand your part of over-fitting, appreciate that. Thanks – Wazy Mar 24 '17 at 10:36
Got the same problem when applying tf.nn.embedding_lookup – AI_ROBOT Nov 30 '17 at 13:07

Tensorflow dynamic_rnn training loss decreasing, validation loss increasing

1 Answers1