1

I'm normalizing my data to zero mean and unit variance as recommended in most literature to pre-train a GB-RBM. But whatever learning rate I choose and whatsoever is the number of epochs, my mean reconstruction error never drops below around 0.6. Reconstruction errors for the stacked BB-RBMs easily drop to 0.01 within a few epochs. I've used several toolkits which implement GBRBMs as mentioned in http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf but all have the same issue. Am I missing something or is the reconstruction error meant to stay above 50% ?

I'm normalizing my data by subtracting mean and dividing by the standard deviation along each dimension of input vector:

size(mfcc) --> [mlength rows x 39 cols]

mmean=mean(mfcc);
mstd=std(mfcc);
mfcc=mfcc-ones(mlength,1)*mmean;
mfcc=mfcc./(ones(mlength,1)*mstd);

This does give me zero mean and unit var along each dimension. I have tried different datasets, different features and different toolkits, but my reconstr error never drops below 0.6 for GBRBMs. Thanks

muneeb
  • 142
  • 13
  • Hi and welcome to stack overflow. This isn't my are of expertise at all... but generally to get help on S/O it helps for you to include any relevant code. Then we can check it for errors or suggest better ways of doing things :) – Taryn East Jul 10 '14 at 07:23

1 Answers1

0

I would guess you are using exp() as the sigmoid and then using a 3rd party library to do the matrix functions?

if the above is true, I would guess the 3rd party library is swallowing the exp() overflow errors but still stopping the calculation and so the hidden/recreated vectors are invalid.

edit based on comment below:

theano.tensor.nnet.sigmoid() is using exp() so I would first try switching to hard_sigmoid(). It won't be as nice of a curve, but it won't overflow/underflow so you can see if that is the source of error.

I assume you tried other data preprocessing and still had the high reconstruction errors?

  • The first sentence is probably better as a comment, although @Muneeb seems unlikely to respond. – kdgregory Aug 17 '14 at 21:18
  • I'm using the Theano library for calculating both the sigmoid and matrix operations. GBRBM code: (http://sourceforge.net/p/kaldipdnn/code-0/HEAD/tree/pdnn/layers/rbm.py) I get good results for Bernoulli-Bernoulli RBMs. Even for the GBRBM, the Free Energy is reduced to around zero, but reconstruction errors stay above 0.5. – muneeb Aug 17 '14 at 23:10
  • I can't comment on posts due to my low rep, so I have to ask questions in an answer. Of course that doesn't exactly help with the rep. So, Yeah. – Mark Vicuna Aug 17 '14 at 23:48
  • Thanks for the hint. Using hard_sigmoid increased the error from 0.68 to 0.76. Changing to ultra_fast_sigmoid reduced it to 0.62. Still not close. I'm pre-processing to normalize all data to zero mean and unit variance across each dimension of feature vector (since the RBM is not built to learn variances). Is there any other pre-processing you would suggest? – muneeb Aug 21 '14 at 19:36
  • What I've found is the different types of RBM want different min/max depending on how the data is distributed in the features. I'd move the mean to 0.5 to and move the FWHM in. – Mark Vicuna Aug 22 '14 at 02:20
  • My edit window timed out.What I've found is the different types of RBM want different min/max for the inputs depending on how the data is distributed in the features. I would check to see what the sum(w_ij*v_ij) is before the sigmoid is applied. My guess is that it will be very large or very small and so its 'saturated' and is getting stuck. – Mark Vicuna Aug 22 '14 at 02:28
  • Changing the mean/variance would require a change in the energy function as well, right? – muneeb Aug 22 '14 at 02:29
  • The pre-sigmoid sum is between -4.5 and +4.5. I tried focusing initial weights to zero (on non-linearity), but errors still don't go below 0.64; tried the smallest and largest learning rates. The only way to make error go down is to use probabilities of hidden layer activations instead of bernoulli states while sampling visible nodes i.e. when the H in P(v|H) is a probability instead of a random on/off state. But this is against every implementation of RBM I've seen. – muneeb Aug 22 '14 at 04:51