transfer parameters between models in tensorflow slows down training time

Question

I've developed a model that requires me to have two versions of a model, one before the training step and one after. I thought I could simply do this using a tf.assign() method call but it seems that this has massively slowed down the training.

Why does tf.assign() slow the execution time?

This post asks a similar question however the author is simply trying to update the learning rate and can do so by just adding a feed_dict. However, calling tf.assign can't really be avoided in my case? The other solution involved separating the graph definition and the graph run but since I require both to be in the session since they need access to the parameters of other models I'm unsure how to do this.

Any help is appreciated.

Code is as simple as:

tf.assign(var[0], var[2])
tf.assign(var[1], var[3])

Q_agent.train(...)

and var[0] and var[1] are the params of the Q_agent.

The training time is quite long in this case. I've adapted the code to try and use a tf.placeholder. The code is as follows:

var = tf.trainable_variables()
params = [var[4], var[5]]
update_hidden = tf.placeholder(params[0].dtype, shape=params[0].get_shape())
update_value = tf.placeholder(params[1].dtype, shape=params[1].get_shape())

for loop: 

    var = tf.trainable_variables()
    old_hidden = var[0] 
    old_value = var[1]

    new_hidden = var[2]
    new_value = var[3]
    update_h = old_hidden.assign(update_hidden)
    update_v = old_value.assign(update_value)

    sess.run([update_h, update_v], feed_dict={update_hidden:   new_hidden.eval(), update_value: new_value.eval()})

Though the train function now runs quickly, this hasn't improved the efficiency of the code because there's continuous slow down in performance when running update_h and update_v. Any ideas?

tryingtolearn · Accepted Answer · 2018-11-09T09:43:23.280

Solved. Key is to define tf.assign() so that it is called once and NOT in the training loop. Otherwise if you call it every time then this adds a new node to the graph and it means you have to do additional computation after every iteration.

var = tf.trainable_variables()
old_hidden = var[0] 
old_value = var[1]
update_h = old_hidden.assign(update_hidden)
update_v = old_value.assign(update_value)

for loop:

    # overwrite old_hidden and old_value here.
    var = tf.trainable_variables()
    old_hidden = var[0] 
    old_value = var[1]

    new_hidden = var[2]
    new_value = var[3]
    # update_h = old_hidden.assign(update_hidden)
    # update_v = old_value.assign(update_value)

    sess.run([update_h, update_v], feed_dict={update_hidden:   new_hidden.eval(), update_value: new_value.eval()})

I'm 100% sure there's a tidier way of doing this but this is what I have!

transfer parameters between models in tensorflow slows down training time

1 Answers1