3

I am trying to reuse the PTB language model on my data but lacking knowledge of Tensorflow to understand how does it handle batch iteration over the training data. Here is how I understand batch iteration during training:

while epoch <= maxepoch do
  for minibatch in data_iterator() do
    model.forward(minibatch)
    (...)
  end
end

Cannot get simpler than this, can it? Something similar is done in many other frameworks but not in Tensorflow :) Here is a sample of minibatch function from official PTB language model tutorial:

def ptb_producer(raw_data, batch_size, num_steps, name=None):
    with tf.name_scope(name, "PTBProducer", [raw_data, batch_size, num_steps]):
        raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)

        data_len = tf.size(raw_data)
        batch_len = data_len // batch_size
        data = tf.reshape(raw_data[0 : batch_size * batch_len],
                                            [batch_size, batch_len])

        epoch_size = (batch_len - 1) // num_steps
        assertion = tf.assert_positive(
                epoch_size,
                message="epoch_size == 0, decrease batch_size or num_steps")
        with tf.control_dependencies([assertion]):
            epoch_size = tf.identity(epoch_size, name="epoch_size")

        i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()
        x = tf.strided_slice(data, [0, i * num_steps], [batch_size, (i + 1) * num_steps])
        x.set_shape([batch_size, num_steps])
        y = tf.strided_slice(data, [0, i * num_steps + 1], [batch_size, (i + 1) * num_steps + 1])
        y.set_shape([batch_size, num_steps])
        return x, y

This function returns x inputs and y targets once it is called. I see no signs of Python iterator here but there is a call to tf.strided_slice which uses i index generated by tf.train.range_input_producer so this should emulate a sliding window over the data. However the function is called only once before the training so how can it iterate over my data then? This is unclear. Can somebody explain this "magic" and completely obscure Tensorflow mechanism?

Maxim
  • 52,561
  • 27
  • 155
  • 209
minerals
  • 6,090
  • 17
  • 62
  • 107

1 Answers1

2

The "magic" is hidden in the line that calls tf.train.range_input_producer:

i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()

... which creates an op which pops the values from the queue holding 0..epoch_size-1 integers. In other words, it iterates over the range 0..epoch_size-1.


Yes, it seems counter-intuitive. So here's a simple runnable example of working with queues in tensorflow:

index = tf.train.range_input_producer(10, shuffle=False).dequeue()

with tf.Session() as sess:
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  for i in range(15):
    print(sess.run(index))

  coord.request_stop()
  coord.join(threads)

Upon running, you should see the values from 0 to 9, then 5 more from 0 to 4. Note that sess.run evaluates the same tensor index, but it gets a different value each time. One can add further ops that depend on index and they will be evaluated with a new value of index.

Also note that the queue operates in another thread, so in order to work with tf.train.range_input_producer one has to start a Coordinator and spawn a number of threads (and stop them in the end). If you try to run the same example without a Coordinator, the sess.run(index) will block script execution.

You can play around with this example, e.g., set shuffle=True, etc.


Going back to the PTB producer snippet:

i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()
x = tf.strided_slice(data, [0, i*num_steps], [batch_size, (i+1)*num_steps])
x.set_shape([batch_size, num_steps])
y = tf.strided_slice(data, [0, i*num_steps+1], [batch_size, (i+1)*num_steps+1])
y.set_shape([batch_size, num_steps])

It should be clear now that even though x and y are defined as simple tensors, they are actually iterators over the slices of data. All the thread work is taken care by tf.train.Supervisor. So calling an optimization op (that depends on x and y) will take new batches automatically.


Suggested reading:

Maxim
  • 52,561
  • 27
  • 155
  • 209