0

I work on Stacked Sparse Autoencoders using MATLAB. Can anyone please suggest what values should be taken for Stacked Sparse Autoencoder parameters:
L2 Weight Regularization ( Lambda) Sparsity Regularization (Beta) Sparsity proportion (Rho).

1 Answers1

1

It is important to realise that there are NO OBVIOUS VALUES for the hyperparameters. The optimal value will vary, depending on the data you're modeling: you'll have to try them on your data.

From sparseAutoencoder Lambda (λ) is coefficient of weight decay term which discourage weights to reach big values since it may overfit. Weight decay term (or weight regularization term) is a part of the cost function like sparsity term explained below.

Rho (ρ) is sparsity constraint which controls average number of activation on hidden layer. It is included to make autoencoder work even with relatively big number of hidden units with respect to input units. For example, if input size is 100 and hidden size is 100 or larger (even smaller but close to 100), the output can be constructed without any lost, since hidden units can learn identity function. Beta (β) is coeffecient of sparsity term which is a part of the cost function. It controls relative importance of sparsity term. Lambda and Beta specify the relative importance of their terms in cost function.

Example: You can take a loot at this example where parameter values are selected as follows.

sparsityParam = 0.1;   % desired average activation of the hidden units.
                       % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
                       %  in the lecture notes). 
lambda = 3e-3;         % weight decay parameter       
beta = 3;              % weight of sparsity penalty term

But once again, i want to make you remember that there are NO OBVIOUS VALUES for the hyperparameters.

Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
  • Thank you so much for your reply @Wasi Ahmad. I am trying to build 3 layer stacked sparse autoencoder model. I have an input layer, which is of size 589, followed by 3 layers of autoencoder, followed by an output layer, which consists of a classifier. As you have said, if my input layer is 589, suppose i set my hidden size for 589 in the first autoencoder layer, what should be the hidden size for the second and third autoencoder layer. Do we have to keep value of sparsity param, lambda and beta same for all autoencoder layers. Please help sir – praveen gb Dec 02 '16 at 10:59
  • 1
    first, hyper parameter values should be same for all layers. keep this in mind, model has hyper-parameters, not layers! so, why would you select different hyper-parameter for different layers? second, what should be the size of the hidden layer is a crucial thing to decide because large value may overfit your training data and lower value may underfit. you can set different size for different hidden layers. but people in many (or most) cases use same size for all hidden layers. you can tune and see what is the best size for your task! – Wasi Ahmad Dec 02 '16 at 16:12