Activation function is a non-linear transformation, usually applied in neural networks to the output of the linear or convolutional layer. Common activation functions: sigmoid, tanh, ReLU, etc.
Questions tagged [activation-function]
343 questions
3
votes
1 answer
SeLU Activation Function x-Parameter causes a typeError
I am building a CNN and am defining a fully connected layer with SeLU as its activation and AlphaDropout(0.5). I am trying to initialize SeLU with a tf.random.normal distribution as follows:
dist = tf.Variable(tf.random.normal([5, 5, 1, 32],…

Catastrophe
- 43
- 4
3
votes
1 answer
How necessary are activation functions after dense layer in neural networks?
I'm currently training multiple recurrent convolutional neural networks with deep q-learning for the first time.
Input is a 11x11x1 matrix, each network consists of 4 convolutional layer with dimensions 3x3x16, 3x3x32, 3x3x64, 3x3x64. I use…

patricia
- 33
- 4
3
votes
1 answer
Restrict the sum of outputs in a neural network regression (Keras)
I'm predicting 7 targets, which is ratio from one value, so for each sample sum of all predicted values should be 1.
Except of using softmax at the output (which seems obviously incorrect) I just cant figure out other ways to restrict sum of all…

Alex_Y
- 588
- 3
- 19
3
votes
1 answer
Assuming the order Conv2d->ReLU->BN, should the Conv2d layer have a bias parameter?
Should we include the bias parameter in Conv2d if we are going for Conv2d followed by ReLU followed by batch norm (bn)?
There is no need if we go for Conv2d followed by bn followed by ReLU, since the shift parameter of bn takes care of bias work.

Venkataraman
- 138
- 1
- 9
3
votes
1 answer
Activation functions in Neural Networks
I have a few set of questions related to the usage of various activation functions used in neural networks? I would highly appreciate if someone could give good explanatory answers.
Why ReLU is used only on hidden layers specifically?
Why Sigmoid…

Sanjay Dutt
- 29
- 4
3
votes
2 answers
Which output activation and loss should I use if I want to predict continuous outcome on 0-1 interval?
I want to predict continuous variable (autoencoder). As I have scaled my inputs by min-max to 0-1 interval, does it make sense to use sigmoid activation in output layer? Sigmoid does not correspond to MSE loss then. Any ideas?

pikachu
- 690
- 1
- 6
- 17
3
votes
1 answer
Output of hidden layer for every epoch and storing that in a list in keras?
I have a keras MLP with single hidden layer. I am using a multilayer perceptron with some specific number of nodes in a single hidden layer. I want to extract the activation value for all the neurons of that hidden layer when a batch is passed and I…

Razor
- 89
- 9
3
votes
1 answer
Neural network - exercise
I am currently learning for myself the concept of neural networks and I am working with the very good pdf from
http://neuralnetworksanddeeplearning.com/chap1.html
There are also few exercises I did, but there is one exercise I really dont…

SMS
- 348
- 2
- 13
3
votes
2 answers
What are the disadvantages of Leaky-ReLU?
We use ReLu instead of Sigmoid activation function since it is devoid of vanishing and exploding gradients problem that has been in sigmoid like activation functions,
Leaky-ReLU is one of rely's improvements. Everyone is talking about the…

YFye
- 77
- 2
- 4
3
votes
2 answers
why linear function is useless in multiple layer neural network? How last layer become the linear function of the input of first layer?
I was studying about activation function in NN but could not understand this part properly -
"Each layer is activated by a linear function. That activation in turn goes into the next level as input and the second layer calculates weighted sum on…

Farhana Yasmeen
- 33
- 3
3
votes
1 answer
Keras - Nan in summary histogram LSTM
I've written an LSTM model using Keras, and using LeakyReLU advance activation:
# ADAM Optimizer with learning rate decay
opt = optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0001)
# build the model
…

Shlomi Schwartz
- 8,693
- 29
- 109
- 186
3
votes
1 answer
ReLU derivative with NumPy
import numpy as np
def relu(z):
return np.maximum(0,z)
def d_relu(z):
z[z>0]=1
z[z<=0]=0
return z
x=np.array([5,1,-4,0])
y=relu(x)
z=d_relu(y)
print("y = {}".format(y))
print("z = {}".format(z))
The code above prints out:
y = [1…

Egbert
- 35
- 1
- 6
3
votes
1 answer
Why does a custom activation function cause network both zero loss and low accuracy?
I was trying to build a custom activation function using tflearn by making following changes:
add my custom activation function to activation.py
def my_activation(x):
return tf.where(x >= 0.0, tf.div( x**2 , x + tf.constant(0.6)) , 0.01*x)
and…

応振强
- 266
- 3
- 12
3
votes
1 answer
Linear activation function in Word to Vector
In word2vec paper, they are using linear activation function. I reason may be that they are giving enough training data for learning word embeddings so that non linear activation function is not necessary, am I correct?
Also if we use non linear…

Azad
- 71
- 4
3
votes
1 answer
Normalizing complex values in NumPy / Python
I am currently trying to normalize complex values..
as i don't have a good way of doing this, i decided to divide my dataset into two, consisting of data with only the real part and data only with the imaginary part.
def split_real_img(x):
…

I am not Fat
- 283
- 11
- 36