I have been playing around in TensorFlow
and made a generic fully connected model.
At each layer I'm applying
sigmoid(WX + B)
which as everybody knows, works well.
I then started messing around with the function that is applied at each layer and found that functions such as
sigmoid(U(X^2) + WX + B)
work just as well when they are optimized.
What does varying this inner function accomplish? Is there a functional application in which changing the inner function would improve the learning of the model or would any function that combines the input and some weights have the same learning capabilities no matter what data is being learned?
I'm aware of many other models of neural nets (such as convolutional nets, recurrent nets, residual nets, etc) so I'm not looking for an explanation of different kinds of nets (unless of course, a certain type of net directly applies what I'm talking about). Mostly interested in a simple fully connected scenario.