1

I was following some another example from this link which is about the tf.data.Dataset.interleave() method.

import tensorflow as tf
import tensorflow.keras as keras

def do_range(i):
    for j in tf.range(i):
        yield j

ds = tf.data.Dataset.range(10).interleave(
    lambda ind: tf.data.Dataset.from_generator(do_range, args=(ind,), output_types=tf.int64))

I tried to understand this example, so I started with the range(3), range(4), ..., and so on.
For example,

ds = tf.data.Dataset.range(3).interleave(
    lambda ind: tf.data.Dataset.from_generator(do_range, args=(ind,), output_types=tf.int64))
res = [x for x in ds.as_numpy_iterator()]
# This returns [0, 0, 1]
ds = tf.data.Dataset.range(4).interleave(
    lambda ind: tf.data.Dataset.from_generator(do_range, args=(ind,), output_types=tf.int64))
res = [x for x in ds.as_numpy_iterator()]
# This returns [0, 0, 0, 1, 1, 2]
ds = tf.data.Dataset.range(5).interleave(
    lambda ind: tf.data.Dataset.from_generator(do_range, args=(ind,), output_types=tf.int64))
res = [x for x in ds.as_numpy_iterator()]
# This returns [0, 0, 0, 0, 1, 1, 1, 2, 2, 3]

Until range(9), it returns the dataset I expected.
But from range(10), it doesn't return the dataset I expected.

ds = tf.data.Dataset.range(10).interleave(
    lambda ind: tf.data.Dataset.from_generator(do_range, args=(ind,), output_types=tf.int64))
res = [x for x in ds.as_numpy_iterator()]
# This returns [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 2, 2, 2, 2, 2, 2, 1, 3, 3, 3, 3, 3, 2, 4, 4, 4, 4, 3, 5, 5, 5, 4, 6, 6, 5, 7, 6, 7, 8]
# Notice that 0s, 1s, 2s are not gathered..

What is going on here? Isn't this should be [0, ..., 0, 1, ..., 1, 2, ..., 2, ..., 8] ?

HyeonPhil Youn
  • 428
  • 4
  • 11
  • I am unable to reproduce this on TF 2.7. Which version you are using? – Frightera Jan 12 '22 at 15:10
  • @Frightera TF 2.5..that's strange.. – HyeonPhil Youn Jan 12 '22 at 15:53
  • @Frightera I checked it again but still got the same result. Did you get what I expected for the case of `tf.data.Dataset.range(10)...`? – HyeonPhil Youn Jan 12 '22 at 15:56
  • What output do you expect? – AloneTogether Jan 12 '22 at 16:02
  • @AloneTogether Analgous to previous examples, output for `tf.data.Dataset.range(10).interleave(...)` should be `[0, 0, 0, ..., 0, 1, ..., 1, 2, ..., 2, ..., 3, ..., 3, 4, ..., 4, 5, ..., 8]`. In words, Nine 0s, Eight 1s, Seven 2s, Six 3s, ..., One 8. – HyeonPhil Youn Jan 12 '22 at 16:06
  • 1
    I tested this on Colab, my results were different than yours. They were not ordered, so I can not reproduce when it is `range(5)` etc. The results that I've obtained were like your last example. – Frightera Jan 12 '22 at 16:31
  • @Frightera It's really weird. I restarted my jupyter and rerun the whole code, still got the same results as above. But when I rerun the code with Colab, I got completely different results. Haha – HyeonPhil Youn Jan 12 '22 at 16:37
  • 1
    Probably it has something to do with `from_generator` not `interleave` but I don't know the internal details of `from_generator`. According to my experience it is little bit problematic I think :( – Frightera Jan 12 '22 at 17:05

0 Answers0