Keras Sequential Model Non-linear Regression Model Bad Prediction

Question

To test a nonlinear sequential model using Keras, I made some random data x1,x2,x3 and y = a + b*x1 + c*x2^2 + d*x3^3 + e (a,b,c,d,e are constants). Loss is getting low really quickly but the model actually predicts a pretty wrong number. I've done it with a linear model with similar codes but it worked right. Maybe the Sequential model is designed wrong. Here is my code


import numpy as np

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras import initializers

# y = 3*x1 + 5*x2 + 10

def gen_sequential_model():
    model = Sequential([Input(3,name='input_layer')),
    Dense(16, activation = 'relu', name = 'hidden_layer1', kernel_initializer=initializers.RandomNormal(mean = 0.0, stddev= 0.05, seed=42)),
    Dense(16, activation = 'relu', name = 'hidden_layer2', kernel_initializer=initializers.RandomNormal(mean = 0.0, stddev= 0.05, seed=42)),
    Dense(1, activation = 'relu', name = 'output_layer', kernel_initializer=initializers.RandomNormal(mean = 0.0, stddev= 0.05, seed=42)),
    ])

    model.summary()
    model.compile(optimizer='adam',loss='mse')
    return model

def gen_linear_regression_dataset(numofsamples=500, a=3, b=5, c=7, d=9, e=11):
    np.random.seed(42)
    X = np.random.rand(numofsamples,3)
    # y = a + bx1 + cx2^2 + dx3^3+ e
    for idx in range(numofsamples):
        X[idx][1] = X[idx][1]**2
        X[idx][2] = X[idx][2]**3
    coef = np.array([b,c,d])
    bias = e

    y = a + np.matmul(X,coef.transpose()) + bias
    return X, y

def plot_loss_curve(history):
    import matplotlib.pyplot as plt
    
    plt.figure(figsize = (15,10))

    plt.plot(history.history['loss'][1:])
    plt.plot(history.history['val_loss'][1:])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train','test'],loc = 'upper right')
    plt.show()

def predict_new_sample(model, x, a=3, b=5, c=7, d=9, e=11):
    x = x.reshape(1,3)
    y_pred = model.predict(x)[0][0]
    y_actual = a + b*x[0][0] + c*(x[0][1]**2) + d*(x[0][2]**3) + e

    print("y actual value: ", y_actual)
    print("y pred value: ", y_pred)


model = gen_sequential_model()
X,y = gen_linear_regression_dataset(numofsamples=2000)
history = model.fit(X,y,epochs = 100, verbose=2, validation_split=0.3)
plot_loss_curve(history)

predict_new_sample(model, np.array([0.7,0.5,0.5]))

Result:

...
Epoch 99/100
44/44 - 0s - loss: 1.0631e-10 - val_loss: 9.9290e-11
Epoch 100/100
44/44 - 0s - loss: 1.0335e-10 - val_loss: 9.3616e-11
y actual value:  20.375
y pred value:  25.50001

Why is my predicted value so different from the real value?

(+1) for posting a *reproducible* example (very seldom nowadays), without which it would be impossible to spot the issue. Keep on like that... — desertnaut, Dec 01 '20 at 12:36

score 5 · Accepted Answer · answered Dec 01 '20 at 01:59

Despite the improper use of activation = 'relu' in the last layer and the use of non-recommended kernel initializations, your model works fine, and the reported metrics are true and not flukes.

The problem is not in the model; the problem is that your data generating function does not return what you intend it to return.

First, in order to see that your model indeed learns what you have asked it to learn, let's run your code as is and then use your data generating function to produce a sample:

X, y_true = gen_linear_regression_dataset(numofsamples=1)
print(X)
print(y_true)

Result:

[[0.37454012 0.90385769 0.39221343]]
[25.72962531]

So for this particular X, the true output is 25.72962531; let's pass now this X to the model using your predict_new_sample function:

predict_new_sample(model, X)
# result:
y actual value:  22.134424269890232
y pred value:  25.729633

Well, the predicted output 25.729633 is extremely close to the true one as calculated above (25.72962531); thing is, your function thinks that the true output should be 22.134424269890232, which is demonstrably not the case.

What has happened is that your gen_linear_regression_dataset function returns the data X after you have calculated the squared and cubic components, which is not what you want; you want the returned data X to be before calculating the square & cube components, so that your model learns how to do this itself.

So, you need to change the function as follows:

def gen_linear_regression_dataset(numofsamples=500, a=3, b=5, c=7, d=9, e=11):
    np.random.seed(42)
    X_init = np.random.rand(numofsamples,3)  # data to be returned
    # y = a + bx1 + cx2^2 + dx3^3+ e
    X = X_init.copy()  # temporary data
    for idx in range(numofsamples):
        X[idx][1] = X[idx][1]**2
        X[idx][2] = X[idx][2]**3
    coef = np.array([b,c,d])
    bias = e

    y = a + np.matmul(X,coef.transpose()) + bias
    return X_init, y

After modifying the function and re-training the model (you'll notice that the validation error ends up somewhat higher, ~ 1.3), we have

X, y_true = gen_linear_regression_dataset(numofsamples=1)
print(X)
print(y_true)

Result:

[[0.37454012 0.95071431 0.73199394]]
[25.72962531]

and

predict_new_sample(model, X)
# result:
y actual value:  25.729625308532768
y pred value:  25.443237

which is consistent. You will still not be getting perfect predictions of course, especially for unseen data (and remember that the error is now higher):

predict_new_sample(model, np.array([0.07,0.6,0.5]))
# result:
y actual value:  17.995
y pred value:  19.69147

As commented briefly above, you should really change your model to get rid from the kernel initializers (i.e. use the default, recommended ones) and use the correct activation function for your last layer:

def gen_sequential_model():
    model = Sequential([Input(3,name='input_layer'),
    Dense(16, activation = 'relu', name = 'hidden_layer1'),
    Dense(16, activation = 'relu', name = 'hidden_layer2'),
    Dense(1, activation = 'linear', name = 'output_layer'),
    ])

    model.summary()
    model.compile(optimizer='adam',loss='mse')
    return model

You'll discover that you get a better validation error and better predictions:

predict_new_sample(model, np.array([0.07,0.6,0.5]))
# result:
y actual value:  17.995
y pred value:  18.272991

score 0 · Answer 2 · answered Dec 01 '20 at 08:22

Nice catch from @desertnaut.

Just to add a few things uppon @desertnaut solution that seem to improve the results.

Scale your data (even that you always use 0-1 it seems to add a little boost)
Add Dropout between layers
Increase number of epochs (150 -200 ?)
Add reduce learning rate on plateau (give it some try)

Add more units to the layers

 reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2,
                               patience=5, min_lr=0.001)

 def gen_sequential_model():
     model = Sequential([
         Input(3,name='input_layer'),
     Dense(64, activation = 'relu', name = 'hidden_layer1'),
     Dropout(0.5),
     Dense(64, activation = 'relu', name = 'hidden_layer2'),
     Dropout(0.5),
     Dense(1, name = 'output_layer')
     ])



  history = model.fit(X, y, epochs = 200, verbose=2, validation_split=0.2, callbacks=[reduce_lr])


  predict_new_sample(model, x=np.array([0.07, 0.6, 0.5]))

  y actual value:  17.995
  y pred value:  17.710054

Keras Sequential Model Non-linear Regression Model Bad Prediction

2 Answers2