0

I'm trying to create a train operation based on CIFAR10 example from Tensorflow that uses tf.RandomShuffleQueue and my labels comes from the name of the files as mentioned in (Accessing filename from file queue in Tensor Flow). How can I use this code with that?

When I try to run the following code, where path is a directory with many files:

filenames = [path, f) for f in os.listdir(path)][1:]
file_fifo = tf.train.string_input_producer(filenames,
                                           shuffle=False,
                                           capacity=len(filenames))
reader = tf.WholeFileReader()
key, value = reader.read(file_fifo)
image = tf.image.decode_png(value, channels=3, dtype=tf.uint8)
image.set_shape([config.image_height, config.image_width, config.image_depth])
image = tf.cast(image, tf.float32)
image = tf.divide(image, 255.0)
labels = [int(os.path.basename(f).split('_')[-1].split('.')[0]) for f in filenames]
label_fifo = tf.FIFOQueue(len(filenames), tf.int32, shapes=[[]])
label_enqueue = label_fifo.enqueue_many([tf.constant(labels)])
label = label_fifo.dequeue()
bq = tf.RandomShuffleQueue(capacity=16 * batch_size,
                           min_after_dequeue=8 * batch,
                           dtypes=[tf.float32, tf.int32])
batch_enqueue_op = bq.enqueue([image, label_enqueue])
runner = tf.train.queue_runner.QueueRunner(bq, [batch_enqueue_op] * num_threads)
tf.train.add_queue_runner(runner)

# Read 'batch' labels + images from the example queue.
images, labels = batch_queue.dequeue_many(FLAGS.batch_size)
labels = tf.reshape(labels, [FLAGS.batch_size, 1])

I get obvious erros, because I know my code doesn't make much sense. I have two FIFO queues file_fifo and label_fifo, but I don't know how to make my label_fifo input of my tf.RandomShuffleQueue.

Can someone help? Thank you :-)

Community
  • 1
  • 1

1 Answers1

0

I changed my code to:

filenames = [os.path.join(FLAGS.data_path, f) for f in os.listdir(FLAGS.data_path)][1:]
np.random.shuffle(filenames)
file_fifo = tf.train.string_input_producer(filenames, shuffle=False, capacity=len(filenames))
reader = tf.WholeFileReader()
key, value = reader.read(file_fifo)
image = tf.image.decode_png(value, channels=3, dtype=tf.uint8)
image.set_shape([config.image_height, config.image_width, config.image_depth])
image = tf.cast(image, tf.float32)
image = tf.divide(image, 255.0)

labels = [int(os.path.basename(f).split('_')[-1].split('.')[0]) for f in filenames]
label_fifo = tf.FIFOQueue(len(filenames), tf.int32, shapes=[[]])
label_enqueue = label_fifo.enqueue_many([tf.constant(labels)])
label = label_fifo.dequeue()

if is_train:
    images, label_batch = tf.train.shuffle_batch([image, label],
                                                 batch_size=FLAGS.batch_size,
                                                 num_threads=FLAGS.num_threads,
                                                 capacity=16 * FLAGS.batch_size,
                                                 min_after_dequeue=8 * FLAGS.batch_size)
labels = tf.reshape(label_batch, [FLAGS.batch_size, 1])

For training I have:

class _LoggerHook(tf.train.SessionRunHook):
    """Logs loss and runtime."""

    def begin(self):
        self._step = -1

    def before_run(self, run_context):
        self._step += 1
        self._start_time = time.time()
        if self._step % int(config.train_examples / FLAGS.batch_size) == 0 or self._step == 0:
            run_context.session.run(label_enqueue_op)
        return tf.train.SessionRunArgs({'loss': loss, 'net': net})

and I run training as:

with tf.train.MonitoredTrainingSession(
        checkpoint_dir=FLAGS.train_path,
        hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps), tf.train.NanTensorHook(loss), _LoggerHook()],
        config=tf.ConfigProto(log_device_placement=FLAGS.log_device_placement)) as mon_sess:
        while not mon_sess.should_stop():
            mon_sess.run(train_op)

The training starts, but it runs only for the very first step and hangs - maybe because it's waiting for some queue command