1

I am trying to train a sparse data with an MLP to predict a forecast. However, the forecast on the test data is giving the same value for all observations. Once I omit the activation function from each layer, the outcome starts being different. my code is below:

# imports
import numpy as np
import tensorflow as tf
import random
import json
from scipy.sparse import rand


# Parameters
learning_rate= 0.1 
training_epochs = 50
batch_size = 100

# Network Parameters
m= 1000 #number of features
n= 5000 # number of observations

hidden_layers = [5,2,4,1,6]
n_layers = len(hidden_layers)
n_input =  m 
n_classes = 1 # it's a regression problem

X_train = rand(n, m, density=0.2,format = 'csr').todense().astype(np.float32)
Y_train =  np.random.randint(4, size=n)


X_test = rand(200, m, density=0.2,format = 'csr').todense().astype(np.float32)
Y_test =  np.random.randint(4, size=200)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None])


# Store layers weight & bias
weights = {}
biases = {}
weights['h1']=tf.Variable(tf.random_normal([n_input,    hidden_layers[0]])) #first matrice
biases['b1'] = tf.Variable(tf.random_normal([hidden_layers[0]]))

for i in xrange(2,n_layers+1):
    weights['h'+str(i)]=   tf.Variable(tf.random_normal([hidden_layers[i-2], hidden_layers[i-1]]))
    biases['b'+str(i)] = tf.Variable(tf.random_normal([hidden_layers[i-1]]))

weights['out']=tf.Variable(tf.random_normal([hidden_layers[-1], 1]))   #matrice between last layer and output
biases['out']= tf.Variable(tf.random_normal([1]))


# Create model
def multilayer_perceptron(_X, _weights, _biases):
    layer_begin = tf.nn.relu(tf.add(tf.matmul(_X, _weights['h1'],a_is_sparse=True), _biases['b1']))

    for layer in xrange(2,n_layers+1):
        layer_begin = tf.nn.relu(tf.add(tf.matmul(layer_begin, _weights['h'+str(layer)]), _biases['b'+str(layer)]))
        #layer_end = tf.nn.dropout(layer_begin, 0.3)

    return tf.matmul(layer_begin, _weights['out'])+ _biases['out']


# Construct model
pred = multilayer_perceptron(x, weights, biases)



# Define loss and optimizer
rmse = tf.reduce_sum(tf.abs(y-pred))/tf.reduce_sum(tf.abs(y)) # rmse loss
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(rmse) # Adam Optimizer

# Initializing the variables
init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)

    #training
    for step in xrange(training_epochs):

        # Generate a minibatch.
        start = random.randrange(1, n - batch_size)
        #print start
        batch_xs=X_train[start:start+batch_size,:]
        batch_ys =Y_train[start:start+batch_size]

        #printing
        _,rmseRes = sess.run([optimizer, rmse] , feed_dict={x: batch_xs, y: batch_ys} )
        if step % 20 == 0:
             print "rmse [%s] = %s" % (step, rmseRes)


    #testing
    pred_test = multilayer_perceptron(X_test, weights, biases)
    print "prediction", pred_test.eval()[:20] 
    print  "actual = ", Y_test[:20]

PS: I am generating randomly my data just to reproduce the error. My data is sparse in fact, pretty similar to the one generated randomly. The problem I want to solve is: MLP is giving the same prediction for all observations in the test data.

chikhawi9
  • 51
  • 1
  • 5
  • Maybe this is my own ignorance of the problem, but is it reasonable to expect that a MLP will converge when trained on completely random data? And even if it did, would it be reasonable to expect that the learned parameters would yield better-than-random accuracy on a separate, randomly generated test set? – Aenimated1 Apr 11 '16 at 20:39
  • Thank you. In fact there is a point in what you said. But, I am generating randomly the data here to reproduce the error. It's not my real data, but it's close to it, it's sparse. Then, the problem is not that the Neural Net converge or not, the issue is with the "same prediction" for all observation is the test data. – chikhawi9 Apr 12 '16 at 09:51

1 Answers1

1

That's a sign that your training failed. With GoogeLeNet Imagenet training I've seen it label everything as "nematode" when started with a bad choice of hyper-parameters. Things to check -- does your training loss decrease? If it doesn't decrease, try different learning rates/architectures. If it decreases to zero maybe your loss is wrong like was case here

Community
  • 1
  • 1
Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197
  • Thank you. Yes, I tried several hyper-parameters (learning rates/ number of layers/number of neurons in each layer). Sometimes I got different predictions ("small" sized networks), and most of the cases either same predictions or predictions far away from reality. The loss function is decreasing slowly (sometimes increasing, not stable at all). I will try other loss functions as you recommended. – chikhawi9 Apr 12 '16 at 11:03