1

I'm building a simple neural network that takes 3 values and gives 2 outputs.

I'm getting an accuracy of 67.5% and an average cost of 0.05

I have a training dataset of 1000 examples and 500 testing examples. I plan on making a larger dataset in the near future.

A little while ago I managed to get an accuracy of about 82% and sometimes a bit higher, but the cost was quite high.

I've been experimenting with adding another layer which is currently in the model and that is the reason I have got the loss under 1.0

I'm not sure what is going wrong, I'm new to Tensorflow and NNs in general.

Here is my code:

import tensorflow as tf
import numpy as np
import sys
sys.path.insert(0, '.../Dataset/Testing/')
sys.path.insert(0, '.../Dataset/Training/')
#other files
from TestDataNormaliser import *
from TrainDataNormaliser import *

learning_rate = 0.01
trainingIteration = 10
batchSize = 100
displayStep = 1


x = tf.placeholder("float", [None, 3])
y = tf.placeholder("float", [None, 2])



#layer 1
w1 = tf.Variable(tf.truncated_normal([3, 4], stddev=0.1))
b1 = tf.Variable(tf.zeros([4])) 
y1 = tf.matmul(x, w1) + b1

#layer 2
w2 = tf.Variable(tf.truncated_normal([4, 4], stddev=0.1))
b2 = tf.Variable(tf.zeros([4]))
#y2 = tf.nn.sigmoid(tf.matmul(y1, w2) + b2)
y2 = tf.matmul(y1, w2) + b2

w3 = tf.Variable(tf.truncated_normal([4, 2], stddev=0.1)) 
b3 = tf.Variable(tf.zeros([2]))
y3 = tf.nn.sigmoid(tf.matmul(y2, w3) + b3) #sigmoid


#output
#wO = tf.Variable(tf.truncated_normal([2, 2], stddev=0.1))
#bO = tf.Variable(tf.zeros([2]))
a = y3 #tf.nn.softmax(tf.matmul(y2, wO) + bO) #y2
a_ = tf.placeholder("float", [None, 2])


#cost function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(a)))
#cross_entropy = -tf.reduce_sum(y*tf.log(a))

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)


#training

init = tf.global_variables_initializer() #initialises tensorflow

with tf.Session() as sess:
    sess.run(init) #runs the initialiser

    writer = tf.summary.FileWriter(".../Logs")
    writer.add_graph(sess.graph)
    merged_summary = tf.summary.merge_all()

    for iteration in range(trainingIteration):
        avg_cost = 0
        totalBatch = int(len(trainArrayValues)/batchSize) #1000/100
        #totalBatch = 10

        for i in range(batchSize):
            start = i
            end = i + batchSize #100

            xBatch = trainArrayValues[start:end]
            yBatch = trainArrayLabels[start:end]

            #feeding training data

            sess.run(optimizer, feed_dict={x: xBatch, y: yBatch})

            i += batchSize

            avg_cost += sess.run(cross_entropy, feed_dict={x: xBatch, y: yBatch})/totalBatch

            if iteration % displayStep == 0:
                print("Iteration:", '%04d' % (iteration + 1), "cost=", "{:.9f}".format(avg_cost))

        #
    print("Training complete")


    predictions = tf.equal(tf.argmax(a, 1), tf.argmax(y, 1))

    accuracy = tf.reduce_mean(tf.cast(predictions, "float"))
    print("Accuracy:", accuracy.eval({x: testArrayValues, y: testArrayLabels}))
Blair Burns
  • 41
  • 1
  • 7

2 Answers2

4

A few important notes:

  • You don't have non-linearities between your layers. This means you're training a network which is equivalent to a single-layer network, just with a lot of wasted computation. This is easily solved by adding a simple non-linearity, e.g. tf.nn.relu after each matmul/+ bias line, e.g. y2 = tf.nn.relu(y2) for all bar the last layer.
  • You are using a numerically unstable cross entropy implementation. I'd encourage you to use tf.nn.sigmoid_cross_entropy_with_logits, and removing your explicit sigmoid call (the input to your sigmoid function is what is generally referred to as the logits, or 'logistic units').
  • It seems you are not shuffling your dataset as you go. This could be particularly bad given your choice of optimizer, which leads us to...
  • Stochastic gradient descent is not great. For a boost without adding too much complication, consider using MomentumOptimizer instead. AdamOptimizer is my go-to, but play around with them.

When it comes to writing clean, maintainable code, I'd also encourage you to consider the following:

  • Use higher level APIs, e.g. tf.layers. It's good you know what's going on at a variable level, but it's easy to make a mistake with all that replicated code, and the default values with the layer implementations are generally pretty good
  • Consider using the tf.data.Dataset API for your data input. It's a bit scary at first, but it handles a lot of things like batching, shuffling, repeating epochs etc. very nicely
  • Consider using something like the tf.estimator.Estimator API for handling session runs, summary writing and evaluation. With all those changes, you might have something that looks like the following (I've left your code in so you can roughly see the equivalent lines).

For graph construction:

def get_logits(features):
    """tf.layers API is cleaner and has better default values."""
    # #layer 1
    # w1 = tf.Variable(tf.truncated_normal([3, 4], stddev=0.1))
    # b1 = tf.Variable(tf.zeros([4]))
    # y1 = tf.matmul(x, w1) + b1
    x = tf.layers.dense(features, 4, activation=tf.nn.relu)

    # #layer 2
    # w2 = tf.Variable(tf.truncated_normal([4, 4], stddev=0.1))
    # b2 = tf.Variable(tf.zeros([4]))
    # y2 = tf.matmul(y1, w2) + b2
    x = tf.layers.dense(x, 4, activation=tf.nn.relu)

    # w3 = tf.Variable(tf.truncated_normal([4, 2], stddev=0.1))
    # b3 = tf.Variable(tf.zeros([2]))
    # y3 = tf.nn.sigmoid(tf.matmul(y2, w3) + b3) #sigmoid
    # N.B Don't take a non-linearity here.
    logits = tf.layers.dense(x, 1, actiation=None)

    # remove unnecessary final dimension, batch_size * 1 -> batch_size
    logits = tf.squeeze(logits, axis=-1)
    return logits


def get_loss(logits, labels):
    """tf.nn.sigmoid_cross_entropy_with_logits is numerically stable."""
    # #cost function
    # cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(a)))
    return tf.nn.sigmoid_cross_entropy_with_logits(
        logits=logits, labels=labels)


def get_train_op(loss):
    """There are better options than standard SGD. Try the following."""
    learning_rate = 1e-3
    # optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    optimizer = tf.train.MomentumOptimizer(learning_rate)
    # optimizer = tf.train.AdamOptimizer(learning_rate)
    return optimizer.minimize(loss)


def get_inputs(feature_data, label_data, batch_size, n_epochs=None,
               shuffle=True):
    """
    Get features and labels for training/evaluation.

    Args:
        feature_data: numpy array of feature data.
        label_data: numpy array of label data
        batch_size: size of batch to be returned
        n_epochs: number of epochs to train for. None will result in repeating
            forever/until stopped
        shuffle: bool flag indicating whether or not to shuffle.
    """
    dataset = tf.data.Dataset.from_tensor_slices(
        (feature_data, label_data))

    dataset = dataset.repeat(n_epochs)
    if shuffle:
        dataset = dataset.shuffle(len(feature_data))
    dataset = dataset.batch(batch_size)
    features, labels = dataset.make_one_shot_iterator().get_next()
    return features, labels

For session running you could use this like you have (what I'd call 'the hard way')...

features, labels = get_inputs(
    trainArrayValues, trainArrayLabels, batchSize, n_epochs, shuffle=True)
logits = get_logits(features)
loss = get_loss(logits, labels)
train_op = get_train_op(loss)
init = tf.global_variables_initializer()
# monitored sessions have the `should_stop` method, which works with datasets
with tf.train.MonitoredSession() as sess:
    sess.run(init)
    while not sess.should_stop():
        # get both loss and optimizer step in the same session run
        loss_val, _ = sess.run([loss, train_op])
        print(loss_val)
    # save variables etc, do evaluation in another graph with different inputs?

but I think you're better off using a tf.estimator.Estimator, though some people prefer tf.keras.Models.

def model_fn(features, labels, mode):
    logits = get_logits(features)
    loss = get_loss(logits, labels)
    train_op = get_train_op(loss)
    predictions = tf.greater(logits, 0)
    accuracy = tf.metrics.accuracy(labels, predictions)
    return tf.estimator.EstimatorSpec(
        mode=mode, loss=loss, train_op=train_op,
        eval_metric_ops={'accuracy': accuracy}, predictions=predictions)


def train_input_fn():
    return get_inputs(trainArrayValues, trainArrayLabels, batchSize)


def eval_input_fn():
    return get_inputs(
        testArrayValues, testArrayLabels, batchSize, n_epochs=1, shuffle=False)


# Where variables and summaries will be saved to
model_dir = './model'

estimator = tf.estimator.Estimator(model_fn, model_dir)
estimator.train(train_input_fn, max_steps=max_steps)

estimator.evaluate(eval_input_fn)

Note if you use estimators the variables will be saved after training, so you won't need to re-train each time. If you want to reset, just delete the model_dir.

DomJack
  • 4,098
  • 1
  • 17
  • 32
  • Thank you for your time and effort. You make some very good points and I much appreciate it. This is my first nn I've written so I don't really know what I'm doing but you raise a good point about saving the variables with tf.estimator, as I wish to implement that later in my project. With the get_logits function, why does the last layer, logits have 3 neurons (from what I understand) when my goal is to output 1 or 0? I certainly have a lot more reading and learning to do if I want to write stronger and cleaner networks. Thanks again. – Blair Burns Mar 17 '18 at 11:23
  • oops, yep, will update. Should just go to 1, then get squeezed - i.e. you infer a single number - to sigmoid of which would be in the range 0 to 1, threshold presumably at 0.5. That corresponds to the logit threshold of 0 - i.e. a negative logit you would interpret as false, a positive one is interpreted as true. – DomJack Mar 17 '18 at 12:51
0

I see that you are using a softmax loss with sigmoidal activation functions in the last layer. Now let me explain the difference between softmax activations and sigmoidal.

You are now allowing the output of the network to be y=(0, 1), y=(1, 0), y=(0, 0) and y=(1, 1). This is because your sigmoidal activations "squish" each element in y between 0 and 1. Your loss function, however, assumes that your y vector sums to one.

What you need to do here is either to penalise the sigmoidal cross entropy function, which looks like this:

-tf.reduce_sum(y*tf.log(a))-tf.reduce_sum((1-y)*tf.log(1-a))

Or, if you want a to sum to one, you need to use softmax activations in your final layer (to get your a's) instead of sigmoids, which is implemented like this

exp_out = tf.exp(y3)
a = exp_out/tf reduce_sum(exp_out)

Ps. I'm using my phone on a train so please excuse typos

Yngve Moe
  • 1,057
  • 7
  • 17
  • Neither solution produces an accuracy more than 74%, and the loss is averaging 4000. Thank you for your explanation. I actually just want a binary out (1 or 0) for on and off, not both on and on, etc. Do you have any other suggestions on how to resolve this? – Blair Burns Mar 17 '18 at 07:02
  • The loss is high because I used reduce sum not reduce mean. Also, for classification methods like this, neural networks are not necessarily the best, so 75% is not unreasonable. I'd switch optimiser to ADAM or Nesterov gradient descent and see if that helps. – Yngve Moe Mar 17 '18 at 07:07
  • Also, your predictions does indeed give binary outputs. Furthermore, since you are using only two classes, I'd recommend having a single output neuron from the final layer (which would change how your y is use as well) I do not have time to write up an answer, but try to think about how you'd create a network that performed two-class classification with only one neuron. – Yngve Moe Mar 17 '18 at 07:11
  • Thank you so much for your answer and your time. – Blair Burns Mar 17 '18 at 07:14