TensorFlow / TFLearn LinearRegressor stops with a very high loss

Question

I am using Tensorflow 1.2, here's the code:

import tensorflow as tf
import tensorflow.contrib.layers as layers
import numpy as np
import tensorflow.contrib.learn as tflearn

tf.logging.set_verbosity(tf.logging.INFO)

# Naturally this is a very simple straight line
# of y = -x + 10
train_x = np.asarray([0., 1., 2., 3., 4., 5.])
train_y = np.asarray([10., 9., 8., 7., 6., 5.])

test_x = np.asarray([10., 11., 12.])
test_y = np.asarray([0., -1., -2.])

input_fn_train = tflearn.io.numpy_input_fn({"x": train_x}, train_y, num_epochs=1000)
input_fn_test = tflearn.io.numpy_input_fn({"x": test_x}, test_y, num_epochs=1000)

validation_monitor = tflearn.monitors.ValidationMonitor(
    input_fn=input_fn_test,
    every_n_steps=10)

fts = [layers.real_valued_column('x')]

estimator = tflearn.LinearRegressor(feature_columns=fts)
estimator.fit(input_fn=input_fn_train,
              steps=1000,
              monitors=[validation_monitor])

print(estimator.evaluate(input_fn=input_fn_test))

It runs as expected. What's happening is that the training stops at step 47 with a very high loss value:

INFO:tensorflow:Starting evaluation at 2017-06-18-20:52:10
INFO:tensorflow:Finished evaluation at 2017-06-18-20:52:10
INFO:tensorflow:Saving dict for global step 1: global_step = 1, loss = 12.5318
INFO:tensorflow:Validation (step 10): global_step = 1, loss = 12.5318
INFO:tensorflow:Saving checkpoints for 47 into    
INFO:tensorflow:Loss for final step: 19.3527.
INFO:tensorflow:Starting evaluation at 2017-06-18-20:52:11
INFO:tensorflow:Restoring parameters from   
INFO:tensorflow:Finished evaluation at 2017-06-18-20:52:11
INFO:tensorflow:Saving dict for global step 47: global_step = 47, loss = 271.831

{'global_step': 47, 'loss': 271.83133}

Few things I completely don't understand (admittedly I'm a complete noob in TF):

Why the loss on step 10 is smaller than loss on step 47?
Why TF decides to stop the training anyway after?
Why "INFO:tensorflow:Loss for final step: 19.3527." and the loss at step 47 do not match each other?

I have imlemented this very algorithm using vanilla TensorFlow and it works as expected, but I really can't get the grasp of what LinearRegressor wants from me here.

score 3 · Answer 1 · answered Jul 06 '17 at 15:32

here are some (partial) answers to your questions. Might not address all your questions but hopefully will give you some more insights.

Why TF decides to stop the training anyway after? This has to do with the fact that you have set num_epochs=1000 and the default batch_size of numpy_input_fn is 128 (see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python/learn/learn_io/numpy_io.py). num_epochs=1000 means that fit method will go through the data at most 1000 times (or 1000 steps, whichever occurs first). That's why fit runs for ceiling(1000 * 6 /128)=47 steps. Setting batch_size to 6 (the size of your training dataset) or num_epochs=None will give you more reasonable results (I suggest setting batch_size to at most 6 since using your training samples cyclically more than once in a single step might not make much sense)
Why the loss on step 10 is smaller than loss on step 47? There are a few different reasons the loss might not decrease. a. not computing the loss on the exact same data at each step. For instance if you sample has size 100 and your batch_size is 32, every step you will compute loss on the next batch of size 32 (this continues cyclically) b. Your learning rate is too high so the loss bounces. To fix this, maybe try to reduce the learning rate or even experiment with different optimizers. I believe by default, the optimizer used in LinearRegressor is FtrlOptimizer. You can change its default learning rate using the following command when you construct LinearRegressor:

estimator =tflearn.LinearRegressor( feature_columns=fts, optimizer=tf.train.FtrlOptimizer(learning_rate=...))

Alternatively, you can try a different optimizer altogether. estimator = tflearn.LinearRegressor( feature_columns=fts, optimizer=tf.train.GradientDescentOptimizer(learning_rate=...))

As you say in the last line, I agree with you, he could change the Optimizer to GradientDescent, however, I tried that and it raises an error: "ERROR:tensorflow:Model diverged with loss = NaN." because you need to loss function. Where do you instance this and where do you apply?? because you need the Y_predict to be able to calculate loss_funcion = tf.losses.mean_squared_error(yf,Y_predict) thx!! — Julio CamPlaz, Jun 21 '18 at 08:13

TensorFlow / TFLearn LinearRegressor stops with a very high loss

1 Answers1