5

It is recommended to use tensorflow dataset as the input pipeline which can be set up as follows:

# Specify dataset
dataset  = tf.data.Dataset.from_tensor_slices((features, labels))
# Suffle
dataset  = dataset.shuffle(buffer_size=1e5)
# Specify batch size
dataset  = dataset.batch(128)
# Create an iterator
iterator = dataset.make_one_shot_iterator()
# Get next batch
next_batch = iterator.get_next()

I should be able to get the batch size (either from dataset itself or from an iterator created from it, i.e. both iterator and next_batch). Maybe someone wants to know how many batches there are in the dataset or its iterators. Or how many batches have been called and how many remain in the iterator? One might also want to get particular elements, or even the entire dataset at once.

I wasn't able to find anything on the tensorflow documentation. Is this possible? If not, does anyone know if this has been requested as an issue on tensorflow GitHub?

Miladiouss
  • 4,270
  • 1
  • 27
  • 34

3 Answers3

1

Try this

import tensorflow as tf
import numpy as np

features=np.array([[3.0, 0.0], [1.0, 2.0], [0.0, 0.0]], dtype="float32")
labels=np.array([[0], [0], [1]], dtype="float32")
dataset = tf.data.Dataset.from_tensor_slices((features, labels))

batch_size = 2
dataset = dataset.batch(batch_size)
iterator = dataset.make_initializable_iterator()
batch_data = iterator.get_next()
with tf.Session() as sess:
    sess.run(iterator.initializer)
    print(np.shape(sess.run(batch_data)[0])[0])
and you will see enter image description here
guorui
  • 871
  • 2
  • 9
  • 21
1

In TF2 at least, the type of a dataset is statically defined and accessible via tf.data.Dataset.element_spec.

This is a somewhat complex return type because it has tuple nesting that matches your Dataset.

>>> tf.data.Dataset.from_tensor_slices([[[1]],[[2]]]).element_spec.shape
TensorShape([1, 1])

If your data is organized as a tuple[image, label], then you'd get a tuple of TensorSpecs. You can index into it if you are certain of the nesting of the return type. E.g.

>>> image = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)
>>> label = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)
>>> train = tf.data.Dataset.zip((image, label))
>>> train.element_spec[0].shape[0]
2
Yaoshiang
  • 1,713
  • 5
  • 15
0

In TF2, tf.data.Datasets are iterables, so you can get a batch by simply doing:

batch = next(iter(dataset))

and then calculating the batch size is trivial since it becomes the size of the first dimension:

batch_size = batch.shape[0]

So a complete example would look like:

# Specify dataset
dataset  = tf.data.Dataset.from_tensor_slices((features, labels))
# Suffle
dataset  = dataset.shuffle(buffer_size=1e5)
# Specify batch size
dataset  = dataset.batch(128)
# Calculate and print batch size
batch_size = next(iter(dataset)).shape[0]
print('Batch size:', batch_size) # prints 128

Or, if you need it as a function:

def calculate_batch_size(dataset):
    return next(iter(dataset)).shape[0]

Note that iterating over a dataset requires eager execution. Moreover, this solution assumes that your dataset is batched, and may get errors if this is not the case. You may also face errors if, after batching, you perform other operations on your dataset that change the shape of its elements.

ruancomelli
  • 610
  • 8
  • 17