How does ReLu work with zero-centered output domain?

Question

In the problem i am trying to solve, my output domain is zero centered, between -1 an 1. When looking up activation functions i noticed that ReLu outputs values between 0 and 1, which basically would mean that your output is all negative or all positive.

This can be mapped back to the appropriate domain through inverse normalization but ReLu is designed to determine the "strength" of a neuron in a single direction, but in my problem, i need to determine the strength of a neuron in one of two direction. If i use tanh, i have to worry about vanishing/exploding gradients, but if i use ReLu, my output will always be "biased" towards positives or negative values because essentially really small values would be have to mapped to a postitive domain and large value a negative domain or visa versa.

Other info: I've used ReLu and it works well but i fear that it is for the wrong reasons. The reason i say this is that it seems for either the pos or neg domain approaching smaller values will mean a stronger connection up to a point, then which it will not be activated at all. Yes the network can technically work (probably harder than it needs to) to keep the entire domain of train outputs in the positive space, but if a value happens to exceed the bounds of the training set it will be non-existent? when in reality it should be even more active

What is the appropriate way to deal with zero centered output domains?

miaout17 · Answer 1 · 2019-01-11T20:54:51.113

0

First, you don't have to put an activation function after the last layer in your neural network. Activation function is required between layers to introduce non-linearity, so it's not required in the last layer.

You're free to experiment various options:

Use tanh. Vanishing/exploding gradient is sometimes not a problem in practice depending on the network architecture, and if you initialize the weights properly.
Do nothing. The NN should be trained to output value between -1 to 1 for "typical" inputs. You can clip the value in application layer.
Clip the output in the network. E.g. out = tf.clip_by_value(out, -1.0, 1.0)
Be creative and try your other ideas.

At the end, ML is a process of trial-and-error. Try different things and find something that works for you. Good luck.

edited Jan 11 '19 at 20:54

answered Jan 11 '19 at 20:30

miaout17

4,715
2
26
32

I've used ReLu and it works well but i fear that it is for the wrong reasons. The reason i say this is that for either the pos or neg domain approaching smaller values will mean a stronger connection up to a point, which doesn't seem right to me. Yes the network can technically work (probably harder than it needs to) to keep the entire domain of train outputs in the positive space, but if a value happens to exceed the bounds of the training set it will be non-existent when in reality it should be even more active. – learningthemachine Jan 11 '19 at 20:40
Vanishing gradients most certainly could be a problem occuring in practice, e.g. in RNNs, even with good initial weights (sequences of length higher than 5 are even problematic IIRC). Furthermore, those would easily saturate. – Szymon Maszke Jan 11 '19 at 20:40
@SzymonMaszke Totally agree with you. Revised my statement a bit. – miaout17 Jan 11 '19 at 20:54

score 0 · Answer 2 · answered Jan 11 '19 at 20:37

0

I think you have to use Sign function. It's zero center and have -1 , 1 as the out put.

Sign function: https://helloacm.com/wp-content/uploads/2016/10/math-sgn-function-in-cpp.jpg

answered Jan 11 '19 at 20:37

Totoro

191
2
7

score 0 · Answer 3 · answered Jan 11 '19 at 20:49

You could go with variations of ReLU which output values with mean closer to zero or being zero (ELU, CELU, PReLU and others) and having other interesting specific traits. Furthermore, it would help with the dying neurons problem in ReLU.

Anyway, I'm not aware of any hard research proving usefulness of one over the other, it is still in experimentation phase and really problem dependent from what I recall (pls correct me if I'm wrong).

And you should really check whether activation function is problematic in your case, it might be totally fine to go with ReLU.

How does ReLu work with zero-centered output domain?

3 Answers3