Why does my implementation of linear regression in Tensorflow done on Ames Housing dataset converge very very slowly?

Question

I am trying to use Linear Regression on the Ames Housing dataset available on Kaggle.

I did some manual cleaning up of the data by removing many features first. Then, I used the following implementation to train.

train_size = np.shape(x_train)[0]
valid_size = np.shape(x_valid)[0]
test_size = np.shape(x_test)[0]
num_features = np.shape(x_train)[1]

graph = tf.Graph()
with graph.as_default():

    # Input
    tf_train_dataset = tf.constant(x_train)
    tf_train_labels = tf.constant(y_train)
    tf_valid_dataset = tf.constant(x_valid)
    tf_test_dataset = tf.constant(x_test)

    # Variables
    weights = tf.Variable(tf.truncated_normal([num_features, 1]))
    biases = tf.Variable(tf.zeros([1]))

    # Loss Computation
    train_prediction = tf.matmul(tf_train_dataset, weights) + biases
    loss = tf.losses.mean_squared_error(tf_train_labels, train_prediction)

    # Optimizer
    # Gradient descent optimizer with learning rate = alpha
    alpha = tf.constant(0.000000003, dtype=tf.float64)
    optimizer = tf.train.GradientDescentOptimizer(alpha).minimize(loss)

    # Predictions
    valid_prediction = tf.matmul(tf_valid_dataset, weights) + biases
    test_prediction = tf.matmul(tf_test_dataset, weights) + biases

This is how my graph runs:

num_steps = 10001

def accuracy(prediction, labels):
    return ((prediction - labels) ** 2).mean(axis=None)

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        _, l, predictions = session.run([optimizer, loss, train_prediction])
        if (step % 1000 == 0):
            print('Loss at step %d: %f' % (step, l))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), y_valid))
     t_pred = test_prediction.eval()
     print('Test accuracy: %.1f%%' % accuracy(t_pred, y_test))

Here is what I've tried:

I have tried increasing the learning rate. But, if I increase the learning rate beyond what I'm using right now, the model fails to converge i.e., the loss explodes to infinity.
Increased the number of iterations to 10,000,000. The loss converges slower the longer I iterate (which is understandable). But I'm still very far from a reasonable value. The loss is usually a 10 digit number

Am I doing something wrong with the graph? Or is linear regression a bad choice for this and I should try using another algorithm? Any help and suggestions is much appreciated!

Anton Codes · Accepted Answer · 2017-07-06T14:47:27.597

Working Code

import csv
import tensorflow as tf
import numpy as np

with open('train.csv', 'rt') as f:
    reader = csv.reader(f)
    your_list = list(reader)

def toFloatNoFail( data ) :
    try :
        return float(data)
    except :
        return 0

data = [ [ toFloatNoFail(x) for x in row ] for row in your_list[1:] ]
data = np.array( data ).astype( float )
x_train = data[:,:-1]
print x_train.shape
y_train = data[:,-1:]
print y_train.shape


num_features = np.shape(x_train)[1]

# Input
tf_train_dataset = tf.constant(x_train, dtype=tf.float32)
tf_train_labels = tf.constant(y_train, dtype=tf.float32)

# Variables
weights = tf.Variable(tf.truncated_normal( [num_features, 1] , dtype=tf.float32))
biases = tf.Variable(tf.constant(0.0, dtype=tf.float32 ))

train_prediction = tf.matmul(tf_train_dataset, weights) + biases

loss = tf.reduce_mean( tf.square( tf.log(tf_train_labels) - tf.log(train_prediction) ))

optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)

num_steps = 10001

def accuracy(prediction, labels):
    return ((prediction - labels) ** 2).mean(axis=None)


with tf.Session() as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        _, l, predictions = session.run([optimizer, loss, train_prediction])
        if (step % 1000 == 0):
            print('Loss at step %d: %f' % (step, l))

Explanation of Critical Change

Your loss function wasn't scaled for the price. The above loss function takes into account that you are really only interested in the error in prices scaled with the original price. So, for a million dollar house being off by $5,000 shouldn't be as bad as being off by $5,000 for a $5,000 house.

The new loss function is :

loss = tf.reduce_mean( tf.square( tf.log(tf_train_labels) - tf.log(train_prediction) ))

Your new loss function converges. But the predictions are very bad even after 50,000 steps. I did some more digging and I found that taking the RMSE of the log of the predicted and actual costs does a better job of calculating the errors in expensive and cheap houses equally. — Apara, Jul 06 '17 at 04:18

Why does my implementation of linear regression in Tensorflow done on Ames Housing dataset converge very very slowly?

1 Answers1

Working Code

Explanation of Critical Change