2

I'm using tf.contrib.data functions for my input pipeline during training (without placeholders). My question is how do I reuse the trained model and feed in new data at test time? The question is similar to this one except I would like to not use placeholders at test either - my test dataset could be very large and the slowdown of placeholders should be avoided there as well.

Is there any way to replace the input pipeline with a new one at test?

Jason
  • 165
  • 7
  • You can switch between the input pipeline, check :https://stackoverflow.com/questions/41162955/tensorflow-queues-switching-between-train-and-validation-data – Vijay Mariappan Jul 25 '17 at 21:29
  • I switch between iterators for training/validation, but at test time I want to be able to plug in some arbitrary data - e.g. data that was not available when the model was trained. – Jason Jul 25 '17 at 21:35

1 Answers1

0

I am not sure if there is an optimum way to solve this problem, but this is how I solved it:

In my model I'm using a simple MLP, so my model() function has lines like this in it:

train_layer = tf.add(tf.matmul(x_train, weights['w1']), biases['b1'])
train_layer = tf.nn.relu(train_layer)
test_layer = tf.add(tf.matmul(x_test, weights['w1']), biases['b1'])
test_layer = tf.nn.relu(test_layer)

As you can see, I have two inputs, x_train, and x_test. These are the handles to get batches of data from the tf.contrib.data dataset iterator:

x_train, x_train_labels = train_iter.get_next()
x_test, x_test_labels = test_iter.get_next()

So I essentially have two flows of data in the same graph, which the exact same operations are performed on. I also have two outputs of the model, mlp_train and mlp_test depending on whether the model was evaluated using x_train or x_test inputs.

Now: if you create your optimiser using the mlp_train output, and create your testing metrics using your mlp_test outputs, you simply need to run: sess.run(optimiser) to train your system on the training dataset, and sess.run(test_metrics) to test your system on your testing dataset, and you never need to use a feed_dict.

EDIT: I read your comment about using "data that was not available when the model was trained", and I don't think this answer satisfies that.

John Scolaro
  • 695
  • 6
  • 20
  • Yeah, that wouldn't allow plugging in an arbitrary new input pipeline later either. There should be some way of reconnecting a model input to something else, but I still can't find anything. – Jason Jul 26 '17 at 17:59
  • Also, this seems like an unnecessarily complicated way of switching between training and validation. You can use a placeholder to switch between two iterators instead of writing your model twice. If the placeholder is a boolean there should be minimal overhead (i.e. not a batch of images). – Jason Jul 26 '17 at 19:44