In Andrew Ng's lecture notes, they use LBFGS and get some hidden features. Can I use gradient descent instead and produce the same hidden features? All the other parameters are the same, just change the optimization algorithm.
Because When I use LBFGS, my autoencoder can produce the same hidden features as in the lectures notes, but when I use gradient descent, the features in the hidden layer are gone, seems like totally random.
To be specific, in order to optimize the cost function, I implement 1)the cost function, 2)gradient of each Weight and Bias. And throw them into scipy optimize tool box to optimize the cost function. And this setting can give me the reasonable hidden features.
But when I change to gradient descent. I tried to let the "Weight - gradient of the Weight" and "Bias - gradient of the Bias". But the resulted hidden features looks like totally random.
Can somebody help me to know the reason? Thanks.