XOR gate with a neural network

Question

I was trying to implement an XOR gate with tensorflow. I succeeded in implementing that, but i don't fully understand why it works. I got help from stackoverflow posts here and here. So both with one hot true and without one hot true outputs. Here is the network as i understood, in order to set things clear.

My Question #1: Notice the RELU function and Sigmoid function. Why we need that(specifically the RELU function)? You may say that in order to achieve non linearity. I understand how RELU achieves non-linearity. I got the answer from here. Now from what I understand the difference between using RELU and without using RELU is this(see the picture).[I tested the tf.nn.relu function. The output is like this]

Now, if the first function works, why not the second function? From my perspective RELU achieves non-linearity by combining multiple linear functions. So both is linear function(upper two). If first one achieves non linearity, 2nd one should too, shouldn't it? The question is that, without using the RELU why the network gets stuck?

XOR gate with one hot true outputs

hidden1_neuron = 10

def Network(x, weights, bias):
    layer1 = tf.nn.relu(tf.matmul(x, weights['h1']) + bias['h1'])
    layer_final = tf.matmul(layer1, weights['out']) + bias['out']
    return layer_final

weight = {
    'h1' : tf.Variable(tf.random_normal([2, hidden1_neuron])),
    'out': tf.Variable(tf.random_normal([hidden1_neuron, 2]))
}
bias = {
    'h1' : tf.Variable(tf.random_normal([hidden1_neuron])),
    'out': tf.Variable(tf.random_normal([2]))
}

x = tf.placeholder(tf.float32, [None, 2])
y = tf.placeholder(tf.float32, [None, 2])

net = Network(x, weight, bias)

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(net, y)
loss = tf.reduce_mean(cross_entropy)

train_op = tf.train.AdamOptimizer(0.2).minimize(loss)

init_op = tf.initialize_all_variables()

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])

with tf.Session() as sess:
    sess.run(init_op)
    for i in range(5000):
        train_data = sess.run(train_op, feed_dict={x: xTrain, y: yTrain})
        loss_val = sess.run(loss, feed_dict={x: xTrain, y: yTrain})
        if(not(i%500)):
            print(loss_val)

    result = sess.run(net, feed_dict={x:xTrain})
    print(result)

The code you see above implements the XOR gate with one hot true outputs. If i take out tf.nn.relu, the network gets stuck. Why?

My Question #2: How can I understand if a network is going to get stuck on some local minima[or some value]? Is it from the plot of cost function (or loss function)? Say, for the network designed above, I used cross entropy as the loss function. I could not find the plotting of cross entropy function. (If you can provide this, this would be very helpful.)

My Question #3: Notice on the code there is a line hidden1_neuron = 10. It means that i have set the number of neurons in the hidden layer 10. Reducing the number of neurons to 5 makes the network to get stuck. So what should be the number of neurons on hidden layer?

The output when the network works the way it is supposed to :

2.42076
0.000456363
0.000149548
7.40216e-05
4.34194e-05
2.78939e-05
1.8924e-05
1.33214e-05
9.62602e-06
7.06308e-06
[[ 7.5128479  -7.58900356]
 [-5.65254211  5.28509617]
 [-6.96340656  6.62380219]
 [ 7.26610374 -5.9665451 ]]

The output when the network gets stuck:

1.45679
0.346579
0.346575
0.346575
0.346574
0.346574
0.346574
0.346574
0.346574
0.346574
[[ 15.70696926 -18.21559143]
 [ -7.1562047    9.75774956]
 [ -0.03214722  -0.03214724]
 [ -0.03214722  -0.03214724]]

I answered a similar question: http://stackoverflow.com/questions/38561182/neural-network-xor-gate-not-learning/38767930#38767930 — Abhijay Ghildyal, Aug 05 '16 at 09:50

jorgenkg · Accepted Answer · 2016-01-02T09:08:05.080

5

Question 1

Both the ReLU and Sigmoid function is non-linear. On the contrary, the function drawn to the right of the ReLU function is linear. Applying multiple linear activation functions will still make the network linear.

Therefore, the network gets stuck when trying to perform linear regression on a non-linear problem.

Question 2

Yes, you will have to pay attention to the progression of the error rate. In larger problem instances, you would typically pay attention to the development of the error function on your test set. This is done by measuring the accuracy of the network after a period of training.

Question 3

The XOR problem requires at least 2 input, 2 hidden, and 1 output node, that is: five nodes are required to correctly model the XOR problem with a simple neural network.

edited Jan 02 '16 at 09:08

answered Jan 01 '16 at 15:33

jorgenkg

4,140
1
34
48

Understood the first one. For second one, can i predict that the neural network may get stuck on a local minima? For example, if you see the rastrigin function, you can see that it has many local minima. Can the same be said for cross entropy? i could not find the plot. For 3rd one, i am using 5 hidden layer neurons. so 2 input, 5 hidden, 2 output(as one hot true), so 9 neurons or nodes doesnt work? why is that? – Shubhashis Jan 01 '16 at 15:48
No, you cannot foresee whether you are approaching a local optima, but you would most likely wound up in one of them. There are a few techniques to (attemp to) avoid local minima, such as adding momentum and using dropout. #3 The XOR problem is a difficult problem, to learn for a neural networ and it is not clear why your particular network struggle to perform efficiently with a 2-5-2 topology. – jorgenkg Jan 01 '16 at 15:59
I mean when you design a neural network is it possible to understand what should be the number of neurons in hidden layer? or you assume a large value, and then reduce to see the impact. Kind of like trial and error? BTW, when i implemented the XOR gate without one hot true outputs,(with sigmoid), it could learn with 2 input, 2 hidden, 1 output. – Shubhashis Jan 01 '16 at 16:06
1

It is considered somewhat an art to guess/estimate the number of neurons and layers that are required for a given problem, and the answer is unfortunately: no. – jorgenkg Jan 01 '16 at 16:59

XOR gate with a neural network

1 Answers1

Question 1

Question 2

Question 3