14

Tensorflow offers a nice LSTM wrapper.

rnn_cell.BasicLSTM(num_units, forget_bias=1.0, input_size=None,
           state_is_tuple=False, activation=tanh)

I would like to use regularization, say L2 regularization. However, I don't have direct access to the different weight matrices used in the LSTM cell, so I cannot explicitly do something like

loss = something + beta * tf.reduce_sum(tf.nn.l2_loss(weights))

Is there a way to access the matrices or use regularization somehow with LSTM?

BiBi
  • 7,418
  • 5
  • 43
  • 69
  • I put a whole process as an answer to your question. Chk out https://stackoverflow.com/questions/37869744/tensorflow-lstm-regularization/46761296#46761296 – sdr2002 Oct 16 '17 at 00:10

3 Answers3

13

tf.trainable_variables gives you a list of Variable objects that you can use to add the L2 regularization term. Note that this add regularization for all variables in your model. If you want to restrict the L2 term only to a subset of the weights, you can use the name_scope to name your variables with specific prefixes, and later use that to filter the variables from the list returned by tf.trainable_variables.

keveman
  • 8,427
  • 1
  • 38
  • 46
13

I like to do the following, yet the only thing I know is that some parameters prefers not to be regularized with L2, such as batch norm parameters and biases. LSTMs contains one Bias tensor (despite conceptually it has many biases, they seem to be concatenated or something, for performance), and for the batch normalization I add "noreg" in the variables' name to ignore it too.

loss = your regular output loss
l2 = lambda_l2_reg * sum(
    tf.nn.l2_loss(tf_var)
        for tf_var in tf.trainable_variables()
        if not ("noreg" in tf_var.name or "Bias" in tf_var.name)
)
loss += l2

Where lambda_l2_reg is the small multiplier, e.g.: float(0.005)

Doing this selection (which is the full if in the loop discarding some variables in the regularization) once made me jump from 0.879 F1 score to 0.890 in one shot of testing the code without readjusting the value of the config's lambda, well this was including both the changes for the batch normalisation and the Biases and I had other biases in the neural network.

According to this paper, regularizing the recurrent weights may help with exploding gradients.

Also, according to this other paper, dropout would be better used between stacked cells and not inside cells if you use some.

About the exploding gradient problem, if you use gradient clipping with the loss that has the L2 regularization already added to it, that regularization will be taken into account too during the clipping process.


P.S. Here is the neural network I was working on: https://github.com/guillaume-chevalier/HAR-stacked-residual-bidir-LSTMs

Guillaume Chevalier
  • 9,613
  • 8
  • 51
  • 79
0

Tensorflow has some built-in and helper functions that let you apply L2 norms to your model such as tf.clip_by_global_norm:

    # ^^^ define your LSTM above here ^^^

    params = tf.trainable_variables()

    gradients = tf.gradients(self.losses, params)

    clipped_gradients, norm = tf.clip_by_global_norm(gradients,max_gradient_norm)
    self.gradient_norms = norm

    opt = tf.train.GradientDescentOptimizer(self.learning_rate)
    self.updates = opt.apply_gradients(
                    zip(clipped_gradients, params), global_step=self.global_step)

in your training step run:

    outputs = session.run([self.updates, self.gradient_norms, self.losses], input_feed)
j314erre
  • 2,737
  • 2
  • 19
  • 26