It's unclear to me what the buffer_size
parameter in tf.TFRecordDataset does. Let's say we have the following code:
dataset = dataset.shuffle(buffer_size=10000).repeat().batch(batch_size)
Does this mean that only the first 10k samples will be used and repeated forever, or will I go through the entire dataset? If not, what does it to exactly? And what about this code?
dataset = dataset.repeat().shuffle(buffer_size=10000).batch(batch_size)
I've noticed this post, but it doesn't say anything about buffer_size
.