1

I'm trying to construct a little educational example for multivariate linear regresssion, but the LOSS is increasing until it explodes rather than getting smaller, any idea?

import tensorflow as tf
tf.__version__

import numpy as np
data = np.array(
    [
        [100,35,35,12,0.32],
        [101,46,35,21,0.34],
        [130,56,46,3412,12.42],
        [131,58,48,3542,13.43]
    ]
)

x = data[:,1:-1]
y_target = data[:,-1]

def loss_function(y, pred):
    return tf.reduce_mean(tf.square(y - pred))

def train(b, w, x, y, lr=0.012):
    with tf.GradientTape() as t:
        current_loss = loss_function(y, linear_model(x))
        lr_weight, lr_bias = t.gradient(current_loss, [w, b])
        w.assign_sub(lr * lr_weight)
        b.assign_sub(lr * lr_bias)

epochs = 80
for epoch_count in range(epochs):
    real_loss = loss_function(y_target, linear_model(x))
    train(b, w, x, y_target, lr=0.12)
    print(f"Epoch count {epoch_count}: Loss value: {real_loss.numpy()}")

This even happens if I initialize the weights with the "correct" values (found out via a scikit-learn regressor)

w = tf.Variable([-1.76770250e-04,3.46688912e-01,2.43827475e-03],dtype=tf.float64)
b = tf.Variable(-11.837184241807234,dtype=tf.float64)
Romeo Kienzler
  • 3,373
  • 3
  • 36
  • 58
  • what happens if you use a TF optimizer instead of doing manual assigns? – thushv89 Nov 28 '19 at 20:21
  • I think in TF2 with Eager Execution this is not possible, documentation says I have to use GradientTape. In case I'm wrong, do you by chance have a code snippet? tf.train.GradientDescentOptimizer is not available in TF2 – Romeo Kienzler Nov 28 '19 at 20:42
  • You definitely can use Optimizers in TF2. I'll post a code snippet in the answer section. – thushv89 Nov 28 '19 at 20:55
  • Thank you very much! This works. Still not clear why it doesn't work manually but at least I have a working example for now – Romeo Kienzler Nov 28 '19 at 21:20
  • So did the exploding gradient problem go away when you changed it to a TF optimizer or not really? – thushv89 Nov 28 '19 at 21:48
  • Yes. It's gone. But only with Adam. With SGD is wasn't exploding but also not converging, just staying constant. So not 100% solved. – Romeo Kienzler Nov 29 '19 at 11:31
  • Have you tried a higher learning rate for SGD? One thing I suspect is that your learning rate is too small. Sort of explains why Adam works and SGD doesn't too. – thushv89 Nov 29 '19 at 11:33
  • What ever learning rate I choose, with SGD loss converges to 3.5, with Adam it goes down far below zero... – Romeo Kienzler Dec 01 '19 at 20:56
  • btw. similar trouble with Keras now. Maybe my data set is particularly challenging? https://stackoverflow.com/questions/59129802/loss-not-changeing-in-very-simple-keras-binary-classifier – Romeo Kienzler Dec 01 '19 at 20:57
  • So, yeah think as said in the comments, you should use sigmoid, not SoftMax. Let me know if that doesn't work. I'll have a look :) – thushv89 Dec 01 '19 at 21:42

2 Answers2

2

Here's how you might use a TF2 optimizer for a toy example (as per the comment). I know this is not the answer but I didn't want to post this in the comments section, as it will mess up the indentation and all that.

tf_x = tf.Variable(tf.constant(2.0,dtype=tf.float32),name='x')
optimizer = tf.optimizers.SGD(learning_rate=0.1)

# Optimizing tf_x using gradient tape
x_series, y_series = [],[]
for step in range(5):    
    x_series.append(tf_x.numpy().item())
    with tf.GradientTape() as tape:
        tf_y = tf_x**2

    gradients = tape.gradient(tf_y, tf_x)
    optimizer.apply_gradients(zip([gradients], [tf_x]))
thushv89
  • 10,865
  • 1
  • 26
  • 39
0

Based on @thushv89's input, I'm providing here an intermediate solution using a TF2 Optimizer which is working, although this is not 100% answering my question

import tensorflow as tf
tf.__version__

import numpy as np
data = np.array(
    [
        [100,35,35,12,0.32],
        [101,46,35,21,0.34],
        [130,56,46,3412,12.42],
        [131,58,48,3542,13.43]
    ]
)

x = data[:,1:-1]
y_target = data[:,-1]

w = tf.Variable([1,1,1],dtype=tf.float64)
b = tf.Variable(1,dtype=tf.float64)

def linear_model(x):
    return b + tf.tensordot(x,w,axes=1)

optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.MeanSquaredLogarithmicError()

def train_step(x, y):
    with tf.GradientTape() as tape:
        predicted = linear_model(x)   
        loss_value = loss_object(y, predicted)
        print(f"Loss Value:{loss_value}")
        grads = tape.gradient(loss_value, [b,w])
        optimizer.apply_gradients(zip(grads, [b,w]))

def train(epochs):
    for epoch in range(epochs):
            train_step(x, y_target)
    print ('Epoch {} finished'.format(epoch))

train(epochs = 1000)
Romeo Kienzler
  • 3,373
  • 3
  • 36
  • 58