Bayesian Neural Network: Computation of Hessian

Question

I'm trying to code in Python several types of ANN algorithms in order to get a better understanding/intuition of those. I'm not using Scikit-learn or any other ready-to-go packages since my goal is rather educational than practical. As an example problem, I use MNIST database (http://yann.lecun.com/exdb/mnist/).

While I performed simple 1-hidden layer NN and convolutional NN, I successfully avoided any second-order methods of optimization and, thus, didn't compute Hessian matrix. However, then I got to Bayesian NN where, in order to optimize hyperparameters, a computation of Hessian is compulsatory.

In my fully connected network, there are 784 inputs, 300 hidden units, and 10 output units. All of those result in 238200 weights (+ biases). When I try to compute or even approximate Hessian (by outer product of gradients), Python notifies on "MemoryError". Even if I decrease the number of weights to ~40000 and no error message is displayed, my computer gets stuck after several minutes. As I understand, the problem is that the desirable matrix is extremely huge. I looked through a couple of articles on Bayesian NNs and noticed that authors usually use network architectures of no more than 10 or 20 inputs and hidden units, thus having a lot fewer parameters than I have. However, I have not seen any explicit statements of such restrictions.

What can I do in order to apply Bayesian approach to NN for MNIST?

More generally: Is it possible to apply Bayesian approach with this (238200 weights) or even larger architecture? Or maybe it is suitable just for relatively small networks?

score 2 · Answer 1 · answered Nov 06 '15 at 16:27

2

You could try the BFGS algorithm for gradient ascent, which approximates the Hessian and tends to save (considerable) memory. There's an implementation in Scipy.

answered Nov 06 '15 at 16:27

James Atwood

4,289
2
17
17

1

I guess you are thinking about L-BFGS (https://en.wikipedia.org/wiki/Limited-memory_BFGS), BFGS requires the same amount of memory as the typical hessian methods so if the OPs problem lies in memory consumption - BFGS will fail too – lejlot Nov 06 '15 at 19:25
Yup. Thanks for the clarification. – James Atwood Nov 07 '15 at 22:00

Bayesian Neural Network: Computation of Hessian

1 Answers1