Problems implementing an XOR gate with Neural Nets in Tensorflow

Question

I want to make a trivial neural network, it should just implement the XOR gate. I am using the TensorFlow library, in python. For an XOR gate, the only data I train with, is the complete truth table, that should be enough right? Over optimization is what I will expect to happen very quickly. Problem with the code is that the weights and biases do not update. Somehow it still gives me 100% accuracy with zero for the biases and weights.

x = tf.placeholder("float", [None, 2])
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))

y = tf.nn.softmax(tf.matmul(x,W) + b)

y_ = tf.placeholder("float", [None,1])


print "Done init"

cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.75).minimize(cross_entropy)

print "Done loading vars"

init = tf.initialize_all_variables()
print "Done: Initializing variables"

sess = tf.Session()
sess.run(init)
print "Done: Session started"

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[1], [0], [0], [0]])


acc=0.0
while acc<0.85:
  for i in range(500):
      sess.run(train_step, feed_dict={x: xTrain, y_: yTrain})


  print b.eval(sess)
  print W.eval(sess)


  print "Done training"


  correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

  accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

  print "Result:"
  acc= sess.run(accuracy, feed_dict={x: xTrain, y_: yTrain})
  print acc

B0 = b.eval(sess)[0]
B1 = b.eval(sess)[1]
W00 = W.eval(sess)[0][0]
W01 = W.eval(sess)[0][1]
W10 = W.eval(sess)[1][0]
W11 = W.eval(sess)[1][1]

for A,B in product([0,1],[0,1]):
  top = W00*A + W01*A + B0
  bottom = W10*B + W11*B + B1
  print "A:",A," B:",B
  # print "Top",top," Bottom: ", bottom
  print "Sum:",top+bottom

I am following the tutorial from http://tensorflow.org/tutorials/mnist/beginners/index.md#softmax_regressions and in the final for-loop I am printing the results form the matrix(as described in the link).

Can anybody point out my error and what I should do to fix it?

score 22 · Accepted Answer · edited Apr 13 '17 at 12:44

There are a few issues with your program.

The first issue is that the function you're learning isn't XOR - it's NOR. The lines:

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[1], [0], [0], [0]])

...should be:

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[0], [1], [1], [0]])

The next big issue is that the network you've designed isn't capable of learning XOR. You'll need to use a non-linear function (such as tf.nn.relu() and define at least one more layer to learn the XOR function. For example:

x = tf.placeholder("float", [None, 2])
W_hidden = tf.Variable(...)
b_hidden = tf.Variable(...)
hidden = tf.nn.relu(tf.matmul(x, W_hidden) + b_hidden)

W_logits = tf.Variable(...)
b_logits = tf.Variable(...)
logits = tf.matmul(hidden, W_logits) + b_logits

A further issue is that initializing the weights to zero will prevent your network from training. Typically, you should initialize your weights randomly, and your biases to zero. Here's one popular way to do it:

HIDDEN_NODES = 2

W_hidden = tf.Variable(tf.truncated_normal([2, HIDDEN_NODES], stddev=1./math.sqrt(2)))
b_hidden = tf.Variable(tf.zeros([HIDDEN_NODES]))

W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 2], stddev=1./math.sqrt(HIDDEN_NODES)))
b_logits = tf.Variable(tf.zeros([2]))

Putting it all together, and using TensorFlow routines for cross-entropy (with a one-hot encoding of yTrain for convenience), here's a program that learns XOR:

import math
import tensorflow as tf
import numpy as np

HIDDEN_NODES = 10

x = tf.placeholder(tf.float32, [None, 2])
W_hidden = tf.Variable(tf.truncated_normal([2, HIDDEN_NODES], stddev=1./math.sqrt(2)))
b_hidden = tf.Variable(tf.zeros([HIDDEN_NODES]))
hidden = tf.nn.relu(tf.matmul(x, W_hidden) + b_hidden)

W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 2], stddev=1./math.sqrt(HIDDEN_NODES)))
b_logits = tf.Variable(tf.zeros([2]))
logits = tf.matmul(hidden, W_logits) + b_logits

y = tf.nn.softmax(logits)

y_input = tf.placeholder(tf.float32, [None, 2])

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, y_input)
loss = tf.reduce_mean(cross_entropy)

train_op = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

init_op = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init_op)

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])

for i in xrange(500):
  _, loss_val = sess.run([train_op, loss], feed_dict={x: xTrain, y_input: yTrain})

  if i % 10 == 0:
    print "Step:", i, "Current loss:", loss_val
    for x_input in [[0, 0], [0, 1], [1, 0], [1, 1]]:
      print x_input, sess.run(y, feed_dict={x: [x_input]})

Note that this is probably not the most efficient neural network for computing XOR, so suggestions for tweaking the parameters are welcome!

Thank you very much for the detailed response. This was very helpful! — Cristian F, Nov 17 '15 at 12:58
Not quite, as it doesn't model the carry-output bit. There are two output bits from a half-adder (the sum and the carry-output), so you'd need 4 output classes (assuming you use the one-hot encoding for output class that I used here). For a full adder you'd also need to model a third input. Both of these should be simple additions to the code I posted though. — mrry, Nov 18 '15 at 01:27
Is there a particular reason why you set the stddev for the random variable initilization to 1./sqrt(2) ? — Cristian F, Nov 18 '15 at 03:45
The intention was to do Glorot-Bengio initialization, but I may have misremembered the formula. Here's a link to the relevant paper: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf — mrry, Nov 18 '15 at 03:53

Problems implementing an XOR gate with Neural Nets in Tensorflow

1 Answers1

Linked