How to enable Dataset pipeline has distributed reading and consuming

Question

It is easy to use two threads that one keeps feeding data to the queue and the other consumes data from the queue and perform the computation. Since the TensorFlow recommends Dataset as input pipeline after 1.2.0., I would like to use the Dataset and its iterator to accomplish the task above, namely:

There are two processes, one feeds and the other consumes;
The pipeline suspends either it is full or empty and it stops when computation finishes at consuming.

P.S. Why in the tutorial of Threading and Queues, TensorFlow uses thread instead of process?

Thank you in advance.

[This question](https://stackoverflow.com/questions/44132579/feed-data-into-a-tf-contrib-data-dataset-like-a-queue) is related or might be the same question. — Albert, Aug 21 '17 at 07:22

score 1 · Accepted Answer · answered Aug 21 '17 at 14:52

1

Distributed tf.contrib.data pipelines are not yet supported as of TensorFlow 1.3. We are working on support for splitting datasets across devices and/or processes, but that support is not yet ready.

In the meantime, the easiest way to achieve your goal is to use a tf.FIFOQueue. You can define a Dataset that reads from a queue as follows:

q = tf.FIFOQueue(...)

# Define a dummy dataset that contains the same value repeated indefinitely.
dummy = tf.contrib.data.Dataset.from_tensors(0).repeat(None)

dataset_from_queue = dummy.map(lambda _: q.dequeue())

You can then compose other Dataset tranformations with dataset_from_queue.

answered Aug 21 '17 at 14:52

mrry

125,488
26
399
400

when will the support for distributed dataset be added? – Jie.Zhou Aug 22 '17 at 00:24
Thank you very much for your kind answer. I would also be appreciated if you are willing to update or post new answer once you have finished your work on distributed `tf.contrib.data`.@mrry – Tengerye Aug 22 '17 at 02:30

How to enable Dataset pipeline has distributed reading and consuming

1 Answers1