Do I need to use one_hot encoding if my output variable is binary?

Question

I am developing a Tensorflow network based on their MNIST for beginners template. Basically, I am trying to implement a simple logistic regression in which 10 continuous variables predict a binary outcome, so my inputs are 10 values between 0 and 1, and my target variable (Y_train and Y_test in the code) is a 1 or 0.

My main problem is that there is no change in accuracy no matter how many training sets I run -- it is 0.276667 whether I run 100 or 31240 steps. Additionally, when I switch from the softmax to simply matmul to generate my Y values, I get 0.0 accuracy, which suggests there may be something wrong with my x*W + b calculation. The inputs read out just fine.

What I'm wondering is a) whether I'm not calculating Y values properly because of an error in my code and b) if that's not the case, is it possible that I need to implement the one_hot vectors -- even though my output already takes the form of 0 or 1. If the latter is the case, where do I include the one_hot=TRUE function in my generation of the target values vector? Thanks!

import numpy as np
import tensorflow as tf
train_data = np.genfromtxt("TRAINDATA2.txt", delimiter="    ")
train_input = train_data[:, :10]
train_input = train_input.reshape(31240, 10)
X_train = tf.placeholder(tf.float32, [31240, 10])

train_target = train_data[:, 10]
train_target = train_target.reshape(31240, 1)
Y_train = tf.placeholder(tf.float32, [31240, 1])

test_data = np.genfromtxt("TESTDATA2.txt", delimiter = "    ")
test_input = test_data[:, :10]
test_input = test_input.reshape(7800, 10)
X_test = tf.placeholder(tf.float32, [7800, 10])

test_target = test_data[:, 10]
test_target = test_target.reshape(7800, 1)
Y_test = tf.placeholder(tf.float32, [7800, 1])

W = tf.Variable(tf.zeros([10, 1]))
b = tf.Variable(tf.zeros([1]))

Y_obt = tf.nn.softmax(tf.matmul(X_train, W) + b)
Y_obt_test = tf.nn.softmax(tf.matmul(X_test, W) + b)

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Y_obt, 
labels=Y_train)
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

for _ in range(31240):
    sess.run(train_step, feed_dict={X_train: train_input, 
    Y_train:train_target})

correct_prediction = tf.equal(tf.round(Y_obt_test), Y_test)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={X_test : test_input, Y_test: 
test_target}))

ml4294 · Accepted Answer · 2017-08-11T19:18:26.697

4

Since you map your values to a target with one element, you should not use softmax cross entropy, since the softmax operation transforms the input into a probability distribution, with the sum of all probabilities equal to 1. Since your target has only one element, it will simply output 1 everytime, since this is the only possible way to transform the input into a probability distribution. You should instead use tf.nn.sigmoid_cross_entropy_with_logits() (which is used for binary classification) and also remove the softmax from Y_obt and convert it into tf.sigmoid() for Y_obt_test.

Another way is to one-hot encode your targets and use a network with a two-element output. In this case, you should use tf.nn.softmax_cross_entropy_with_logits(), but remove the tf.nn.softmax() from Y_obt, since the softmax cross entropy expects unscaled logits (https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits). For the Y_obt_test, you should of course not remove it in this case.

Another thing: It might also help to take the mean of the cross entropies with cross_entropy = tf.reduce_mean(tf.sigmoid_cross_entropy_...).

edited Aug 11 '17 at 19:18

answered Aug 11 '17 at 19:10

ml4294

2,559
5
24
24

Thanks so much for responding. I replaced the cross_entropy function with the sigmoid function you suggested, removed softmax from Y_obt, and replaced softmax with sigmoid in Y_obt_test. Accuracy with 31240 steps is now 0.72333, up from 0.318333 with 100. Looks like success! If you have time for one more question: is the sigmoid function only necessary for the Y_obt_test array because test data are not subjected to the gradient descent process used in training? Any clarification would be appreciated -- thanks again. – mudstick Aug 11 '17 at 20:42
If I get your question correctly, I think you are right. To further clarify the point: `tf.nn.sigmoid_cross_entropy_with_logits()` computes the sigmoid and then the cross entropy, but only for the training set. The test set is not used for the training, and therefore the cross entropy is not computed. Since sigmoid cross entropy does both things in one command (for the training set), you have to perform a sigmoid manually for the test set in order to bring the test set to the correct (i.e. the same) scale. I think this is what you asked, therefore you probably got the relevant point. – ml4294 Aug 12 '17 at 07:07
Super. Thank you. – mudstick Aug 12 '17 at 16:45
Thank you I also got struck at a similar issue and had to change my output to one hot encoding using below code: from keras.utils.np_utils import to_categorical y_train = to_categorical(y_train) y_test = to_categorical(y_test) y_cv = to_categorical(y_cv) – nathandrake Jan 09 '20 at 19:19

Do I need to use one_hot encoding if my output variable is binary?

1 Answers1