the loss doesn't change for the network with a linear layer after softmax

Question

# Inputs and Placeholders
x = tf.placeholder(tf.float32, shape=[None, 30])
y_ = tf.placeholder(tf.float32)

# Inference
W_1 = tf.Variable(tf.zeros([30,50]))
b_1 = tf.Variable(tf.zeros([50]))
layer_1 = tf.sigmoid(tf.add(tf.matmul(x, W_1), b_1))

W_2 = tf.Variable(tf.zeros([50,25]))
b_2 = tf.Variable(tf.zeros([25]))
layer_1_value = tf.add(tf.matmul(layer_1, W_2), b_2)
layer_2 = tf.nn.softmax(layer_1_value)

# W_3 is a fixed weight matrix.
w_linear = np.array([item + 5 for item in xrange(50, 300, 10)])
W_3 = tf.Variable(w_linear, trainable=False)
y = tf.reduce_sum(tf.multiply(tf.to_float(W_3), layer_2))

# Loss
mean_square = tf.losses.mean_squared_error(y_, y)
loss = tf.reduce_mean(mean_square, name='square_mean')

# Training
tf.summary.scalar('loss', loss)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)

In this network, I tried to get sum of weighted average of the output from layer2 (a softmax layer). The final linear layer doesn't involve in the training.

Does anyone know why the loss doesn't change after the first epoch?

('Epoch:0001', 'cost=2499180.068965517')
('Epoch:0002', 'cost=2335760.384482760')
('Epoch:0003', 'cost=2335760.384482760')
('Epoch:0004', 'cost=2335760.384482760')
('Epoch:0005', 'cost=2335760.384482760')
('Epoch:0006', 'cost=2335760.384482760')
('Epoch:0007', 'cost=2335760.384482760')
('Epoch:0008', 'cost=2335760.384482760')
('Epoch:0009', 'cost=2335760.384482760')

What's the purpose of having a layer after the softmax? That is very unusual. — Dr. Snoopy, Jun 09 '17 at 17:32
could you make that layer to trainable = True and see if it learns anything ? — Harsha Pokkalla, Jun 09 '17 at 18:53
But the last layer isn't supposed to change. Since every node of that softmax layer represents for a value range, and we tried to get the final value for that. — Luca, Jun 09 '17 at 19:22
No..I am saying just try it to see whether loss is decreasing or not ? If it is not, error might be somewhere else. — Harsha Pokkalla, Jun 09 '17 at 20:23
I just replaced W_1 = tf.Variable(tf.zeros([30,50])) with W_1 = tf.Variable(tf.random_normal([30, 50], stddev=0.35)), and it works. Guess I shouldn't use zeros as default value — Luca, Jun 09 '17 at 21:21

the loss doesn't change for the network with a linear layer after softmax

0 Answers0