TensorFlow: adding regularization to LSTM

Question

Following Tensorflow LSTM Regularization I am trying to add regularization term to the cost function when training parameters of LSTM cells.

Putting aside some constants I have:

def RegularizationCost(trainable_variables):
    cost = 0
    for v in trainable_variables:
        cost += r(tf.reduce_sum(tf.pow(r(v.name),2)))
    return cost

...

regularization_cost = tf.placeholder(tf.float32, shape = ())
cost = tf.reduce_sum(tf.pow(pred - y, 2)) + regularization_cost
optimizer = tf.train.AdamOptimizer(learning_rate = 0.01).minimize(cost)

...

tv = tf.trainable_variables()
s = tf.Session()
r = s.run

...

while (...):
    ...

    reg_cost = RegularizationCost(tv)
    r(optimizer, feed_dict = {x: x_b, y: y_b, regularization_cost: reg_cost})

The problem I have is that adding the regularization term hugely slows the learning process and actually the regularization term reg_cost is increasing with each iteration visibly when the term associated with pred - y pretty much stagnated i.e. the reg_cost seems not to be taken into account.

As I suspect I am adding this term in completely wrong way. I did not know how to add this term in the cost function itself so I used a workaround with scalar tf.placeholder and "manually" calculated the regularization cost. How to do it properly?

score 6 · Accepted Answer · answered Jan 25 '17 at 00:12

compute the L2 loss only once:

tv = tf.trainable_variables()
regularization_cost = tf.reduce_sum([ tf.nn.l2_loss(v) for v in tv ])
cost = tf.reduce_sum(tf.pow(pred - y, 2)) + regularization_cost
optimizer = tf.train.AdamOptimizer(learning_rate = 0.01).minimize(cost)

you might want to remove the variables that are bias as those should not be regularized.

score 1 · Answer 2 · answered Jan 25 '17 at 00:12

It slows down because your code creates new nodes in every iteration. This is not how you code with TF. First, you create your whole graph, including regularization terms, then, in the while loop you only execute them, each "tf.XXX" operation creates new nodes.

TensorFlow: adding regularization to LSTM

2 Answers2