TensorFlow convergence is stuck to a high value

Question

I was playing around with TensorFlow and I was looking at the tutorial from:

https://github.com/aymericdamien/TensorFlow-Examples/tree/0.11/examples/3_NeuralNetworks

Because I did not want to do the MNINST database, I changed the script with some data I have created with 8000 training samples. The evaluation is done with 300 test samples. The output is a binary classification. Bear in mind that I just dived in Machine learning and that my knowledge is quite restricted for now.

The script works fine, however my cost is stuck at a very high value and does not converge to 0. First, is it normal? How can I improve this? Did I do something wrong? Second the accuracy is not very good either, is it due to the bad convergence? Maybe 8000 is not enough to train the model? or the value are too scattered to actually be able to get a better accuracy.

I found a similar problem here:

tensorflow deep neural network for regression always predict same results in one batch

but I do not understand why or how this problem applies to me.

Could someone help me please?

Here is what the output is:

Starting 1st session...
Epoch: 0001 cost= 39926820.730

and at the end I get:

Epoch: 0671 cost= 64.798
Epoch: 0681 cost= 64.794
Epoch: 0691 cost= 64.791
Optimization Finished!
Accuracy: 0.716621

The codes is as follow:

import tensorflow as tf
import pandas as pd
import numpy as np
import csv

inputData = pd.read_csv('./myInputDataNS.csv', header=None)
runData = pd.read_csv('./myTestDataNS.csv', header=None)

trX, trY = inputData.iloc[:, :7].values, inputData.iloc[:,7].values
temp = trY.shape
trY = trY.reshape(temp[0], 1)
trY = np.concatenate((1-trY, trY), axis=1)

teX, teY = runData.iloc[:, :7].values, runData.iloc[:, 7].values
temp = teY.shape
teY = teY.reshape(temp[0], 1)
teY = np.concatenate((1-teY, teY), axis=1)


# Parameters
learning_rate = 0.001
training_epochs = 700
batch_size = 100
display_step = 10

# Network Parameters
n_hidden_1 =  320 
n_hidden_2 =  320
n_hidden_3 =  320 
n_input = 7 
n_classes = 2 # (0 or 1)

x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])


def multilayer_perceptron(x, weights, biases):
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)

    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)

    layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])
    layer_3 = tf.nn.relu(layer_3)

    out_layer = tf.matmul(layer_3, weights['out']) + biases['out']
    return out_layer

weights = {
'h1': tf.Variable(tf.random_normal([len(trX[0]), n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'h3': tf.Variable(tf.random_normal([n_hidden_3, n_hidden_3])),
'out': tf.Variable(tf.random_normal([n_hidden_3, n_classes]))
}

biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'b3': tf.Variable(tf.random_normal([n_hidden_3])),
'out': tf.Variable(tf.random_normal([n_classes]))
}

pred = multilayer_perceptron(x, weights, biases)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

init = tf.global_variables_initializer()


print("Starting 1st session...")

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(training_epochs):
        epoch_loss = 0
        i = 0
        while i < len(trX):
            start = i
            end = i + batch_size
            batch_x = np.array(trX[start:end])
            batch_y = np.array(trY[start:end])
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
            epoch_loss += c
            i += batch_size

            epoch_loss += c / len(trX[0])
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.3f}".format(epoch_loss))
    print("Optimization Finished!")

    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({x: teX, y: teY}))

What data are you using on your training set? Can you make it available? There are some possibilities. Maybe the dataset is too complex for a feed forward network, maybe it has too many wrong labels (yes, I did that once hehe), maybe it needs more training, maybe is not big enough. With the dataset I can make some tests and try to help. I'm not a expert too. Maybe someone with more knowledge can give a faster answer. — Will Glück, May 12 '17 at 11:57
Yes I can make them available no problem, I just need to find how to attach a file here ;) — Joesmaker, May 12 '17 at 12:03
Also, sorry but, what do you mean it has too many wrong labels? Thanks for your help — Joesmaker, May 12 '17 at 12:12
yes please make the problem reproducable, so people can help. Questions like this are very hard to answer. I suggest looking into TensorBoard, so you have a better idea of your network — dv3, May 12 '17 at 12:14
please find the file here: https://www.4shared.com/folder/2FNnEIGd/_online.html — Joesmaker, May 12 '17 at 12:38
Hi there. By "wrong labels" I mean literaly wrong examples for training. A stupid example : If I'm training a network to execute the logic AND and I have as input 1 and 1 , the result should be 1, but in the training set I have a example that says that 1 and 1 is 0. Tomorrow Night I'm going to do some testing on your dataset to see if I come to any conclusions. — Will Glück, May 15 '17 at 14:06
Oh ok, I understand. I also read a bit more and found some very interesting topic on stackoverflow on similar problem. http://stackoverflow.com/questions/40709074/binary-classification-in-tensorflow-unexpected-large-values-for-loss-and-accura and http://stackoverflow.com/questions/40709870/changing-accuracy-value-and-no-change-in-loss-value-in-binary-classification-usi but that did not improve my loss. Thank you for your help in any case — Joesmaker, May 15 '17 at 14:26
Man I will be honest. I did some tests yesterday and today. I even called a friend to help me. But I got nowhere. I think the problem is in your data. As I don't have any insight about it, it's hard to say. But I changed the number of neurons, normalized the data and didn't manage to get better results. Maybe there is no enough correlation between the inputs? Sorry — Will Glück, May 17 '17 at 21:57
Thanks a lot for your help. You are probably right with my data and the fact that there is not enough correlation. A few questions though, would more data help the ML algorithm to find a "certain" correlation? Would another technique such as SVM would be a better choice with only 8000 data? Thanks a lot again. — Joesmaker, May 18 '17 at 12:29

TensorFlow convergence is stuck to a high value

0 Answers0