5

I'm trying to group my training examples by their length: https://www.tensorflow.org/versions/r0.12/api_docs/python/contrib.training/bucketing

But I want to use the new Data API. So I'm wondering is there a way to do it.

Here is my code:

import tensorflow as tf

vocabulary = ["This", "is", "my", "first", "example",
              "the", "second", "one","How", "to", "bucket",
              "examples", "using", "new", "Data", "API"]

data = ["This is my first example",
        "How to bucket my examples using the new Data API",
        "This is the second one",
        "How to bucket my examples using the new Data API"]

BATCH_SIZE = 2

lookup_table = tf.contrib.lookup.index_table_from_tensor(vocabulary)
dataset = tf.data.Dataset.from_tensor_slices(data)


def tokenize(x):
    words = tf.string_split([x], " ").values
    return words


def lookup(x):
    ids = lookup_table.lookup(x)
    return ids


bucket_boundaries = [5, 10]


def bucketing(x):
    return tf.contrib.training.bucket_by_sequence_length(
        input_length=10,
        tensors=[x],
        batch_size=1,
        bucket_boundaries=bucket_boundaries,
        dynamic_pad=True
    )

# dataset = (dataset
#            .map(tokenize)
#            .map(lookup)
#            # .padded_batch(BATCH_SIZE, padded_shapes=[?])
#            )

dataset = (dataset
           .map(tokenize)
           .map(lookup)
           .map(bucketing)
           )

iterator = dataset.make_initializable_iterator()
next_batch = iterator.get_next()

init_op = tf.group(tf.global_variables_initializer(),
                   tf.tables_initializer(),
                   iterator.initializer)

sess = tf.Session()
sess.run(init_op)

for i in range(len(data)):
    batch = sess.run(next_batch)
    print(batch)

The expected output should be something like this:

[[0 1 2 3 4], [0 1 5 6 7]]

[[8 9 10 2 11 12 5 13 14 15], [8 9 10 2 11 12 5 13 14 15]]

The code above throws OutOfRangeError.

OutOfRangeError (see above for traceback): End of sequence

Community
  • 1
  • 1
Ilia Vatahov
  • 141
  • 3

0 Answers0