I am trying to determine the exact mathematics used to train feed forward networks used for classification in deeplearning4j with stochastic gradient descent. I have tried stepping through the code but am getting lost in the forest.
Is this documented anywhere?