Building a non linear model with ReLUs in TensorFlow

Question

I'm trying to build a simple non-linear model in TensorFlow. I have created this sample data:

x_data = np.arange(-100, 100).astype(np.float32)
y_data = np.abs(x_data + 20.)

I guess this shape should be easily reconstructed using a couple of ReLUs, but I can't figure out how.

So far, my approach is to wrap linear components with ReLUs, but this doesn't run:

W1 = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
W2 = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b1 = tf.Variable(tf.zeros([1]))
b2 = tf.Variable(tf.zeros([1]))

y = tf.nn.relu(W1 * x_data + b1) + tf.nn.relu(W2 * x_data + b2)

Any ideas about how to express this model using ReLUs in TensorFlow?

With `W1, W2, b1, b2 = 1., -1., 20., -20.`it should give you exactly your data (don't know if it was your question). What do you mean when you say "this doesn't run"? — Olivier Moindrot, Apr 15 '16 at 13:03
Yes, you are right. My problem was that I was using the wrong model: GradientDescentOptimizer(0.5), which returned "InvalidArgumentError: ReluGrad input is not finite. : Tensor had Inf and NaN values". Using the proposed model: AdamOptimizer(learning_rate) I'm able to train the two ReLUs to the expected values. Thanks for your help. — prl900, Apr 18 '16 at 01:25

score 2 · Accepted Answer · answered Apr 15 '16 at 17:12

I think you're asking how to combine ReLUs in a working model? Two options are shown below:

Option 1) Input of ReLU1 into ReLU2

This is probably the preferred method. Note that r1 is the input to r2.

x = tf.placeholder('float', shape=[None, 1])
y_ = tf.placeholder('float', shape=[None, 1])

W1 = weight_variable([1, hidden_units])
b1 = bias_variable([hidden_units])
r1 = tf.nn.relu(tf.matmul(x, W1) + b1)

# Input of r1 into r2 (which is just y)
W2 = weight_variable([hidden_units, 1])
b2 = bias_variable([1])
y = tf.nn.relu(tf.matmul(r1,W2)+b2) # ReLU2

Option 2) Add ReLU1 and ReLU2

Option 2 was listed in the original question, but I don't know if this is what you really want...read below for a full working example and try it. I think you'll find it doesn't model well.

x = tf.placeholder('float', shape=[None, 1])
y_ = tf.placeholder('float', shape=[None, 1])

W1 = weight_variable([1, hidden_units])
b1 = bias_variable([hidden_units])
r1 = tf.nn.relu(tf.matmul(x, W1) + b1)

# Add r1 to r2 -- won't be able to reduce the error.
W2 = weight_variable([1, hidden_units])
b2 = bias_variable([hidden_units])
r2 = tf.nn.relu(tf.matmul(x, W2) + b2)
y = tf.add(r1,r2)  # Again, ReLU2 is just y

Full Working Example

Below is a full working example. By default it uses option 1, however, option 2 is also included in the comments.

 from __future__ import print_function
 import tensorflow as tf
 import numpy as np
 import matplotlib.pyplot as plt

 # Config the matlotlib backend as plotting inline in IPython
 %matplotlib inline


 episodes = 55
 batch_size = 5
 hidden_units = 10
 learning_rate = 1e-3

 def weight_variable(shape):
     initial = tf.truncated_normal(shape, stddev=0.1)
     return tf.Variable(initial)

 def bias_variable(shape):
     initial = tf.constant(0.1, shape=shape)
     return tf.Variable(initial)


 # Produce the data
 x_data = np.arange(-100, 100).astype(np.float32)
 y_data = np.abs(x_data + 20.)

 # Plot it.
 plt.plot(y_data)
 plt.ylabel('y_data')
 plt.show()

 # Might want to randomize the data
 # np.random.shuffle(x_data)
 # y_data = np.abs(x_data + 20.)

 # reshape data ...
 x_data = x_data.reshape(200, 1)
 y_data = y_data.reshape(200, 1)

 # create placeholders to pass the data to the model
 x = tf.placeholder('float', shape=[None, 1])
 y_ = tf.placeholder('float', shape=[None, 1])

 W1 = weight_variable([1, hidden_units])
 b1 = bias_variable([hidden_units])
 r1 = tf.nn.relu(tf.matmul(x, W1) + b1)

 # Input of r1 into r2 (which is just y)
 W2 = weight_variable([hidden_units, 1])
 b2 = bias_variable([1])
 y = tf.nn.relu(tf.matmul(r1,W2)+b2) 

 # OPTION 2 
 # Add r1 to r2 -- won't be able to reduce the error.
 #W2 = weight_variable([1, hidden_units])
 #b2 = bias_variable([hidden_units])
 #r2 = tf.nn.relu(tf.matmul(x, W2) + b2)
 #y = tf.add(r1,r2)


 mean_square_error = tf.reduce_sum(tf.square(y-y_))
 training = tf.train.AdamOptimizer(learning_rate).minimize(mean_square_error)

 sess = tf.InteractiveSession()
 sess.run(tf.initialize_all_variables())

 min_error = np.inf
 for _ in range(episodes):
     # iterrate trough every row (with batch size of 1)
     for i in range(x_data.shape[0]-batch_size+1):
         _, error = sess.run([training, mean_square_error],  feed_dict={x: x_data[i:i+batch_size], y_:y_data[i:i+batch_size]})
         if error < min_error :
             min_error = error
             if min_error < 3:
                 print(error)
         #print(error)
         #print(error, x_data[i:i+batch_size], y_data[i:i+batch_size])


 # error = sess.run([training, mean_square_error],  feed_dict={x: x_data[i:i+batch_size], y_:y_data[i:i+batch_size]})
 # if error != None:
 #    print(error)


 sess.close()

 print("\n\nmin_error:",min_error)

It might be easier to see in a jupiter notebook here

score 0 · Answer 2 · answered Apr 15 '16 at 06:27

Here is a simple feedforward network with one hidden layer.

import numpy as np
import tensorflow as tf

episodes = 55
batch_size = 5
hidden_units = 10
learning_rate = 1e-3

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

# normalize the data and shuffle them
x_data = np.arange(0, 1, 0.005).astype(float)
np.random.shuffle(x_data)
y_data = np.abs(x_data + .1)

# reshape data ...
x_data = x_data.reshape(200, 1)
y_data = y_data.reshape(200, 1)

# create placeholders to pass the data to the model
x = tf.placeholder('float', shape=[None, 1])
y_ = tf.placeholder('float', shape=[None, 1])

W1 = weight_variable([1, hidden_units])
b1 = bias_variable([hidden_units])
h1 = tf.nn.relu(tf.matmul(x, W1) + b1)

W2 = weight_variable([hidden_units, 1])
b2 = bias_variable([1])
y = tf.matmul(h1, W2) + b2

mean_square_error = tf.reduce_sum(tf.square(y-y_))
training = tf.train.AdamOptimizer(learning_rate).minimize(mean_square_error)

sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())

for _ in xrange(episodes):
    # iterrate trough every row (with batch size of 1)
    for i in xrange(x_data.shape[0]-batch_size+1):
        _, error = sess.run([training, mean_square_error],  feed_dict={x: x_data[i:i+batch_size], y_:y_data[i:i+batch_size]})
        #print error
        print error, x_data[i:i+batch_size], y_data[i:i+batch_size]


error = sess.run([training, mean_square_error],  feed_dict={x: x_data[i:i+batch_size], y_:y_data[i:i+batch_size]})
print error

Well, I guess that this doesn't answer my question. I'm asking how to model a specific function using ReLUs... — prl900, Apr 15 '16 at 07:56

score 0 · Answer 3 · answered Apr 18 '16 at 01:34

Inspired by all responses I managed to train this model by using the proposed model in the accepted answer. Here is the code:

import tensorflow as tf
import numpy as np

# Create 200 x, y data points in NumPy to represent the function
x_data = np.arange(-100, 100).astype(np.float32)
y_data = np.abs(x_data + 20.) 

W1 = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b1 = tf.Variable(tf.zeros([1]))
W2 = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b2 = tf.Variable(tf.zeros([1]))
y = tf.nn.relu(W1 * x_data + b1) + tf.nn.relu(W2 * x_data + b2)

# Minimize the mean squared errors.
mean_square_error = tf.reduce_sum(tf.square(y-y_data))
train = tf.train.AdamOptimizer(learning_rate).minimize(mean_square_error)

sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
# Fit the non-linear function.
for step in xrange(50000):
   sess.run(train)
   if step % 10000 == 0:
       #Expected values: W1 = 1., W2 = -1., b1 = 20., b2 = -20.
       print(step, sess.run(W1), sess.run(b1), sess.run(W2), sess.run(b2))

I think TensorBoard can be very help too, when building models; but, the documentation is somewhat difficult to follow. I kept getting errors until I added "tf.reset_default_graph()", which isn't listed in the documentation. ref: https://www.kaggle.com/mchirico/d/uciml/iris/tensorflow-on-iris — Mike Chirico, Apr 20 '16 at 02:57

Building a non linear model with ReLUs in TensorFlow

3 Answers3