Building multiple models in the same graph

Question

I am attempting to build two similar models predicting different output types. One predicts between two categories and the other has six output categories. Their inputs are the same and they are both LSTM RNN.

I have separated training and predicting out into separate functions in each of their files, model1.py, model2.py.

I have made the mistake of naming variables in each model the same thing so that when I call predict1 and predict2 from model1 and model2 respectively I get the following name space error: ValueError: Variable W already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

Where W is the name of the matrix of weights.

Is there a good way of running these predictions from the same place? I have attempted to rename the variables involved but still get the following error. It doesn't seem that it's possible to name an lstm_cell on it's creation, is it?

ValueError: Variable RNN/BasicLSTMCell/Linear/Matrix already exists

EDIT: After scoping around model1pred and model2pred in the predictions file I get the following error when calling model1pred() then model2pred()

tensorflow.python.framework.errors.NotFoundError: Tensor name model1/model1/BasicLSTMCell/Linear/Matrix" not found in checkpoint files './variables/model1.chk

EDIT: The code is included here. The code in model2.py is missing but is equivalent to in model1.py except n_classes=2, and within the dynamicRNN function and inside pred the scope is set to 'model2'.

SOLUTION: The problem was the graph which the saver was trying to restore included variables from the first pred() execution. I was able to wrap calls of pred functions in different graphs to solve the issue, removing the need to variable scoping.

In collect predictions file:

def model1pred(test_x, test_seqlen):
    from model1 import pred
    with tf.Graph().as_default():
        return pred(test_x, test_seqlen)

def model2pred(test_x, test_seqlen):
    from model2 import pred
    with tf.Graph().as_default():
        return pred(test_x, test_seqlen)

##Import test_x, test_seqlen

probs1, preds1 = model1pred(test_x, test_seq)
probs2, cpreds2 = model2Pred(test_x, test_seq)

In model1.py

def dynamicRNN(x, seqlen, weights, biases):
    n_steps = 10
    n_input = 14
    n_classes = 6
    n_hidden = 100

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, n_steps, n_input)
    # Required shape: 'n_steps' tensors list of shape (batch_size, n_input)

    # Permuting batch_size and n_steps
    x = tf.transpose(x, [1, 0, 2])
    # Reshaping to (n_steps*batch_size, n_input)
    x = tf.reshape(x, [-1,n_input])
    # Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
    x = tf.split(0, n_steps, x)

    # Define a lstm cell with tensorflow
    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)

    # Get lstm cell output, providing 'sequence_length' will perform dynamic calculation.
    outputs, states = tf.nn.rnn(lstm_cell, x, dtype=tf.float32, sequence_length=seqlen)

    # When performing dynamic calculation, we must retrieve the last
    # dynamically computed output, i.e, if a sequence length is 10, we need
    # to retrieve the 10th output.
    # However TensorFlow doesn't support advanced indexing yet, so we build
    # a custom op that for each sample in batch size, get its length and
    # get the corresponding relevant output.

    # 'outputs' is a list of output at every timestep, we pack them in a Tensor
    # and change back dimension to [batch_size, n_step, n_input]
    outputs = tf.pack(outputs)
    outputs = tf.transpose(outputs, [1, 0, 2])

    # Hack to build the indexing and retrieve the right output.
    batch_size = tf.shape(outputs)[0]
    # Start indices for each sample
    index = tf.range(0, batch_size) * n_steps + (seqlen - 1)
    # Indexing
    outputs = tf.gather(tf.reshape(outputs, [-1, n_hidden]), index)

    # Linear activation, using outputs computed above
    return tf.matmul(outputs, weights['out']) + biases['out']

def pred(test_x, test_seqlen):
     with tf.Session() as sess:
        n_steps = 10
        n_input = 14
        n_classes = 6
        n_hidden = 100
        weights = {'out': tf.Variable(tf.random_normal([n_hidden, n_classes]), name='W1')}
        biases = {'out': tf.Variable(tf.random_normal([n_classes]), name='b1')}
        x = tf.placeholder("float", [None, n_steps, n_input])
        y = tf.placeholder("float", [None, n_classes])
        seqlen = tf.placeholder(tf.int32, [None])

        pred = dynamicRNN(x, seqlen, weights, biases)
        saver = tf.train.Saver(tf.all_variables())
        y_p =tf.argmax(pred,1)

        init = tf.initialize_all_variables()
        sess.run(init)

        saver.restore(sess,'./variables/model1.chk')
        y_prob, y_pred= sess.run([pred, y_p], feed_dict={x: test_x, seqlen: test_seqlen})
        y_prob = np.array([softmax(x) for x in y_prob])
        return y_prob, y_pred

'

Maybe create one of the models in a custom [variable_scope](https://www.tensorflow.org/versions/r0.10/how_tos/variable_scope/index.html) block? — Yaroslav Bulatov, Aug 03 '16 at 18:11
do you really need the gigantic prose to explain your problem? Consider dividing the question where one part its easy to see what the core of your issue is rather than throwing lots of lines of code or explaining the motivation of your problem. This site is more about coding, so try to focus on that. — Charlie Parker, Nov 22 '16 at 00:00
Also your title of your question seems rather broad while the details seems rather specific. Can you change the title to better reflect what your question is about? — Charlie Parker, Nov 22 '16 at 00:01

mrry · Accepted Answer · 2016-08-03T21:37:13.430

0

You can do this by adding with tf.variable_scope(): blocks around the two pieces of model construction code. This has the effect of prefixing the variables' names with a different prefix, which avoids the clash.

For example (using the model1pred() and model2pred() functions defined in your question):

with tf.variable_scope('model1'):
  # Variables created in here will be named 'model1/W', etc.
  probs1, preds1 = model1pred(test_x, test_seq)

with tf.variable_scope('model2'):
  # Variables created in here will be named 'model2/W', etc.
  probs2, cpreds2 = model2Pred(test_x, test_seq)

For more details, see the in-depth HOWTO on variable sharing in TensorFlow.

edited Aug 03 '16 at 21:37

answered Aug 03 '16 at 18:12

mrry

125,488
26
399
400

I'll note that the models are separate files if that changes anything. I wrapped each method for training and making predictions with each model with variable_scopes. Inside the separate method which creates the LSTM cell I also set tf.nn.rnn(...., scope='model1'). Each model runs when the other is not, as before, but the second will fail if run in succession. – John Aug 03 '16 at 20:32
Does it work if you invoke the code in different outermost variable scopes? (The file should have no effect on the variable scope.) If not, can you update the question with the top-level code for your program? – mrry Aug 03 '16 at 20:36
I assumed by invoking the code in a different outermost variable scope you mean to wrap the pred functions in model1pred and model2pred in variable scopes in the prediction file when calling the function? This did not fix the error. I edited in the code in the original post – John Aug 03 '16 at 21:22
I updated the answer to show how you'd scope the two models, based on the functions in your code. Let me know if that doesn't work. – mrry Aug 03 '16 at 21:37
I attempted this, removing the scope from inside the model1/2.py pred() function so that each function would work when called independently. The second model appears to look for variables under in the save file of the first model called? – John Aug 04 '16 at 15:24
Are you creating `tf.train.Saver` objects in the `model1pred()` and `model2pred()` functions? By default the `Saver` constructor will attempt to save/restore `tf.all_variables()`, which when you call `model2pred()` will include the variables created by `model1pred()`. At this point it might be easier to refactor the code so that you use a different `tf.Graph` for each model... it looks from your code (I can't quite tell due to the formatting) that the individual functions return NumPy arrays and not tensors, so there's not much (beneficial) sharing going on. – mrry Aug 04 '16 at 16:47
I apologize for the formatting, thank you for your help! Yes, I am creating Savers inside each pred() function executed by model1pred() and model2pred(). How do I go about using a different graph? I suppose I believed that by naming the variables differently the graphs would be disconnected and distinct. – John Aug 04 '16 at 18:02
That's a reasonable thing to believe :), but some of the components (like the `Saver`) aren't scope-aware and their default behavior introduces implicit, unintended dependencies. To use different graphs, you would wrap each `model*pred()` function in a `with tf.Graph().as_default():` block instead of a variable scope. Note that you would no longer be able to pass `tf.Tensor` objects into these functions (because they would come from a different graph), so common code (e.g. for generating `test_x` and `test_seq`) might need to move into those functions; I'm not sure if that affects you though. – mrry Aug 04 '16 at 18:06
Thank you a ton, I knew I was in luck when I saw your username! This worked perfectly and allows me to remove the variable scoping. I'll update my post with the changes. – John Aug 04 '16 at 18:27

Building multiple models in the same graph

1 Answers1

Linked