Adding softmax significantly changes weight updates

Question

I have a neural network of the form N = W1 * Tanh(W2 * I), where I is the Input Vector/Matrix. When I learn these weights the output has a certain form. However, when I add a normalization layer, for example, N' = Softmax( W1 * Tanh(W2 * I) ) however the in the output vector of N' a single element is close to 1 while the rest are almost zero. This is the case, not only with SoftMax() but with any normalizing layer. Is there any standard solution to such a problem?

what do you mean by "certain form"? And why do you call it a problem? This is completely normal (and desired!) behaviour for normalizing in classification. What is the exact application (there is an attention tag yet no mention of attention in the question) — lejlot, Oct 21 '17 at 19:05
It is a self-attention encoder-decoder model (as in N described above is a self-attention model) @lejlot By a certain form, I mean the output vector has certain characteristics (which are desired) like it increases till the middle and then decreases and increases alternately (e.g. 0.1,0.3,0.5, 1.5, 0.5, 1, 0.3, 1.2). However, after adding a Softmax Layer, I get something like this - (0.001, 0.001, 0, 0.01, 0.998, 0.001, 0, 0, ...). — Rumu, Oct 22 '17 at 03:16
This simply means that the output `N` has one value significantly larger than others. Add the `N` value to the question. — Maxim, Oct 22 '17 at 14:47

score 0 · Accepted Answer · answered Oct 22 '17 at 04:20

0

That is the behavior of the softmax function. Perhaps what you need is a sigmoid function.

answered Oct 22 '17 at 04:20

Julio Daniel Reyes

5,489
1
19
23

The problem persists even with a simple Normalization, as in yi = yi/Sum of yi's – Rumu Oct 22 '17 at 10:53
That is the desired behavior, perhaps what you need is something else. What is your expected result? – Julio Daniel Reyes Oct 22 '17 at 13:59

Adding softmax significantly changes weight updates

1 Answers1