0

I am trying to implement an autoencoder, using the regularization method described in this paper: "Saturating Auto-Encoders", Goroshin et. al., 2013

Essentially, this tries to minimize the difference between the output of the hidden layer, and the flat portions of the non-linear function being used to compute the hidden layer output.

Assuming we are using a step function as the nonlinearity, with the step being at 0.5, a simple implementation might be:

for i in range(len(y)):
    if y[i] < 0.5:
        y_prime[i] = 0
    else:
        y_prime[i] = 1

Then, the regularization cost can be simply:

(numpy.abs(y-y_prime).sum()

I am trying to implement this functionality in Theano. Have started off with the denoising autoencoder code available at the Theano website. Have made some basic modifications to it:

def get_cost_updates(self, learning_rate, L1_reg):
    # I do not want to add noise as of now, hence no input corruption.
    # Directly compute the hidden layer values.
    y = self.get_hidden_values(self.x)
    z = self.get_reconstructed_input(y)

    # Also, the original code computes the cross entropy loss function.
    # I want to use the quadratic loss as my inputs are real valued, not
    # binary. Further, I have added an L1 regularization term to the hidden
    # layer values.

    L = 0.5*T.sum((self.x-z)**2, axis=1) + L1_reg*(abs(y).sum())
    ... # Rest of it is same as original.

The above loss functions puts an L1 penalty on the hidden layer output, which should (hopefully) drive most of them to 0. In place of this simple L1 penalty, I want to use the saturating penalty as given above.

Any idea how to do this? Where do I compute the y_prime? How to do it symbolically?

I am a newbie to Theano, and still catching up with the symbolic computation part.

  • I don't think the step function is a good choice. Looking at the paper, they use nonlinearities which (a) are continuous (b) have gradient at some points. Your step function would essentially always give gradient of zero, so would be a very bad choice for a gradient descent procedure. – Dan Stowell Apr 30 '15 at 09:20

1 Answers1

0

The nonlinearities in the paper are applied during coding, i.e. during calculation of the hidden values. Therefore, given your code example, they should be applied inside the get_hidden_values() function (NOT in the get_cost_updates() function). They should be the last piece of processing that get_hidden_values() does before returning.

Also, don't use numpy.abs in your symbolic expression because that demands that numpy does the calculation. Instead you want Theano to do it, so just use abs and I think it should work as needed.

Dan Stowell
  • 4,618
  • 2
  • 20
  • 30