tensorflow deep neural network for regression always predict same results in one batch

Question

I use a tensorflow to implement a simple multi-layer perceptron for regression. The code is modified from standard mnist classifier, that I only changed the output cost to MSE (use tf.reduce_mean(tf.square(pred-y))), and some input, output size settings. However, if I train the network using regression, after several epochs, the output batch are totally the same. for example:

target: 48.129, estimated: 42.634
target: 46.590, estimated: 42.634
target: 34.209, estimated: 42.634
target: 69.677, estimated: 42.634
......

I have tried different batch size, different initialization, input normalization using sklearn.preprocessing.scale (my inputs range are quite different). However, none of them worked. I have also tried one of sklearn example from Tensorflow (Deep Neural Network Regression with Boston Data). But I got another error in line 40:

'module' object has no attribute 'infer_real_valued_columns_from_input'

Anyone has clues on where the problem is? Thank you

My code is listed below, may be a little bit long, but very straghtforward:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import learn
import matplotlib.pyplot as plt

from sklearn.pipeline import Pipeline
from sklearn import datasets, linear_model
from sklearn import cross_validation
import numpy as np

boston = learn.datasets.load_dataset('boston')
x, y = boston.data, boston.target
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(
x, y, test_size=0.2, random_state=42)

total_len = X_train.shape[0]

# Parameters
learning_rate = 0.001
training_epochs = 500
batch_size = 10
display_step = 1
dropout_rate = 0.9
# Network Parameters
n_hidden_1 = 32 # 1st layer number of features
n_hidden_2 = 200 # 2nd layer number of features
n_hidden_3 = 200
n_hidden_4 = 256
n_input = X_train.shape[1]
n_classes = 1

# tf Graph input
x = tf.placeholder("float", [None, 13])
y = tf.placeholder("float", [None])

# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)

    # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)

    # Hidden layer with RELU activation
    layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])
    layer_3 = tf.nn.relu(layer_3)

    # Hidden layer with RELU activation
    layer_4 = tf.add(tf.matmul(layer_3, weights['h4']), biases['b4'])
    layer_4 = tf.nn.relu(layer_4)

    # Output layer with linear activation
    out_layer = tf.matmul(layer_4, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], 0, 0.1)),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], 0, 0.1)),
    'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3], 0, 0.1)),
    'h4': tf.Variable(tf.random_normal([n_hidden_3, n_hidden_4], 0, 0.1)),
    'out': tf.Variable(tf.random_normal([n_hidden_4, n_classes], 0, 0.1))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1], 0, 0.1)),
    'b2': tf.Variable(tf.random_normal([n_hidden_2], 0, 0.1)),
    'b3': tf.Variable(tf.random_normal([n_hidden_3], 0, 0.1)),
    'b4': tf.Variable(tf.random_normal([n_hidden_4], 0, 0.1)),
    'out': tf.Variable(tf.random_normal([n_classes], 0, 0.1))
}

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.square(pred-y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(total_len/batch_size)
        # Loop over all batches
        for i in range(total_batch-1):
            batch_x = X_train[i*batch_size:(i+1)*batch_size]
            batch_y = Y_train[i*batch_size:(i+1)*batch_size]
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c, p = sess.run([optimizer, cost, pred], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch

        # sample prediction
        label_value = batch_y
        estimate = p
        err = label_value-estimate
        print ("num batch:", total_batch)

        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
            print ("[*]----------------------------")
            for i in xrange(3):
                print ("label value:", label_value[i], \
                    "estimated value:", estimate[i])
            print ("[*]============================")

    print ("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print ("Accuracy:", accuracy.eval({x: X_test, y: Y_test}))

As a sidenote, the evaluation part is wrong. Since you are performing regression, you should evaluate the sum of squares (in your case `cost`) as follows (while inside the session) `accuracy = sess.run(cost, feed_dict={x:X_test, y: Y_test})` and for the values itself you may do `predicted_vals = sess.run(pred, feed_dict={x: X_test})`. — Kots, Mar 27 '17 at 15:48
@Kots Thank you, you are right, the original code is directly modified from prediction so that I forget to change it. will modify the source soon. — Sufeng Niu, Apr 01 '17 at 03:56
SufengNiu can you please share your fixed code? I'm facing the same problem, when I tried to take transpose of pred(pred = tf.transpose(pred)) as @CNugteren said, then I get dimensions error and when I set batch size to 1, I get the index out of bound error, I couldn't make it fix. — Nargis, May 25 '17 at 11:32
@Itkrux, I had to make the following 2 changes so that dimensions of pred and y match: 1. Added this line "y = np.reshape(y, [y.shape[0], 1])" after reading the boston data i.e. after this line "x, y = boston.data, boston.target" 2. Changed "y = tf.placeholder("float", [None, ])" to "y = tf.placeholder("float", [None, 1])". I get relatively better predictions of the second label with this and 500 epochs end with a much lower cost. The accuracy values are showing low too, so I am still looking into that. — TechnoIndifferent, May 31 '17 at 20:55
Sorry, the accuracy values are accurate, so I think my code works properly now. — TechnoIndifferent, May 31 '17 at 21:12
Thankyou @TechnoIndifferent, I'm facing same problem, so I'll try this out. — Nargis, Jun 01 '17 at 21:45
Hello I am facing a similar challenge, all the results in a batch always yield the same value. I tried to reshape the y vector as suggested by @TechnoIndifferent however I still face the same issue. When I try to transpose the `pred` vector I get shape error. Can someone help me find a solution for the problem? — Vasanti, Oct 13 '17 at 16:03

score 27 · Accepted Answer · answered Jul 27 '16 at 14:17

27

Short answer:

Transpose your pred vector using tf.transpose(pred).

Longer answer:

The problem is that pred (the predictions) and y (the labels) are not of the same shape: one is a row vector and the other a column vector. Apparently when you apply an element-wise operation on them, you'll get a matrix, which is not what you want.

The solution is to transpose the prediction vector using tf.transpose() to get a proper vector and thus a proper loss function. Actually, if you set the batch size to 1 in your example you'll see that it works even without the fix, because transposing a 1x1 vector is a no-op.

I applied this fix to your example code and observed the following behaviour. Before the fix:

Epoch: 0245 cost= 84.743440580
[*]----------------------------
label value: 23 estimated value: [ 27.47437096]
label value: 50 estimated value: [ 24.71126747]
label value: 22 estimated value: [ 23.87785912]

And after the fix at the same point in time:

Epoch: 0245 cost= 4.181439120
[*]----------------------------
label value: 23 estimated value: [ 21.64333534]
label value: 50 estimated value: [ 48.76105118]
label value: 22 estimated value: [ 24.27996063]

You'll see that the cost is much lower and that it actually learned the value 50 properly. You'll have to do some fine-tuning on the learning rate and such to improve your results of course.

answered Jul 27 '16 at 14:17

CNugteren

860
9
14

Thank you so much! – Sufeng Niu Jul 28 '16 at 15:13
It was so helpful! Thanks! – Simankov Feb 19 '17 at 17:14
I know it's quite some time, but can you let us know how did you realise what was happening there? – Kots Mar 27 '17 at 13:06
@kots By printing the shapes of the predictions and labels. You can do so using "tuple(tensor_name.get_shape().as_list())" – CNugteren Mar 27 '17 at 15:04
Is there anyway to send all the tensor dimensions for checking, to tensorboard? – Kots Mar 27 '17 at 17:27
@CNugteren, I ```tf.transpose()``` and as @Sufeng_Niu said the cost function reduces to the cost_values you mentioned. But the code crashes when it arrives to ```correct_prediction``` due to the dimensions. I run it and dimension for pred is ```(?, 1)``` and y ```(?,)```. I am trying to measure the MSE but evidently, I cannot make the calculation as ```pred``` dimension cannot be compared with ```Y_test``` – iblasi May 23 '17 at 13:47
Thanks! You saved my day! – Alan Apr 03 '18 at 05:23

score 1 · Answer 2 · answered Jul 15 '16 at 20:25

1

There is likely a problem with your dataset loading or indexing implementation. If you only modified the cost to MSE, make sure pred and y are correctly being updated and you did not overwrite them with a different graph operation.

Another thing to help debug would be to predict the actual regression outputs. It would also help if you posted more of your code so we can see your specific data loading implementation, etc.

answered Jul 15 '16 at 20:25

ahaque

341
1
5

Thank you, Please check the updated question. I agree with you that may be the label index was incorrectly set, but I checked the index of y and print out the label value. it seems that it has no problem. it works for the mnist classification and I verified it. As you mentioned, I also checked the actual regression outputs, the output in one batch are almost the same. the code may be a little bit long, but actually very straight forward. Do you have any suggestions? Thank you – Sufeng Niu Jul 15 '16 at 21:54

tensorflow deep neural network for regression always predict same results in one batch

2 Answers2

Linked