1

given some data in a dataset (or tensor) tensor = tf.constant([1, 2, 3, 4, 5, 6, 7])

i need to create N batches of M tuples by drawing (say 4 x 3) with replacement. an example minibatch could be

 [[1 2 3], [3, 4 5], [2, 3, 4], [5, 6, 7]]

The aim is to avoid creating a dataset in this form

[[1, 2, 3]
  [2, 3, 4]
  [4, 5, 6]
 ]

because of the massive redundancy. The batches should be created on the fly as I feed new mini-batches into the training process.

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
Roman
  • 31
  • 2

1 Answers1

2

I found one way here, would you think this is optimal? or is it better to somehow deploy queues directly?

this code is based on the above link

import tensorflow as tf
import numpy as np


def gen_batch():

    # compute number of batches to emit
    num_of_batches = round(((len(sequence) - batch_size) / stride))

    # emit batches
    for i in range(0, num_of_batches * stride, stride):
        result = np.array(sequence[i:i + batch_size])
        yield result


sequence = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
batch_size = 3
stride = 1

ds = tf.data.Dataset.from_generator(gen_batch, tf.float64)
ds = ds.shuffle(100)
ds_out = ds.make_one_shot_iterator().get_next()

sess = tf.Session()

print(sess.run(ds_out))
print(sess.run(ds_out))
print(sess.run(ds_out))
print(sess.run(ds_out))
print(sess.run(ds_out))

prints:

[3. 4. 5.]
[1. 2. 3.]
[2. 3. 4.]
[4. 5. 6.]
[5. 6. 7.]
Roman
  • 31
  • 2