I am trying to use TPUEstimator
with train_and_evaluate()
for an experiment on GCMLE. The TPUEstimator
has a required argument train_batch_size
that obviously specifies the batch size. However, for train_and_evaluate()
I also specify a batch size through the TrainSpec:
train_input = lambda: input_fn(
filenames = hparams.train_files,
batch_size = hparams.train_batch_size,
hparams = hparams,
num_epochs = hparams.num_epochs,
shuffle=True,
skip_header_lines=1
)
train_spec = tf.estimator.TrainSpec(train_input, max_steps = hparams.train_steps)
estimator = tpu_estimator.TPUEstimator(
use_tpu=True,
model_fn=model_fn,
config=run_config,
train_batch_size = hparams.train_batch_size,
eval_batch_size = hparams.eval_batch_size,
)
tf.estimator.train_and_evaluate(tpu_estimator, train_spec, eval_spec)
In this example, consider that train_input
within train_spec has it's own batch_size specified (for something like tf.train.batch() or tf.datasets.batch()) and also train_batch_size
is a requirement of a TPUEstimator.
This seems very sloppy to me to have train_batch_size
passed in two different places -- is the recommendation just to make sure that the same batch size is passed to both TPUEstimator and the TrainSpec? If the batch_size in TPUEstimator differed from the batch_size in the TrainSpec passed to train_and_evaluate()
what would take preference? Is there a better way to use train_and_evaluate() with a TPUEstimator and not need to pass this batch_size in two different places?
Additionally, it appears that TPUEstimator automatically creates params['batch_size'] which appears to be the "effective batch size" according to documentation. How does the effctive batch size related to train_batch_size? If my train_batch_size is 1024, is the "effective batch size" 128 (because of the 8 cores)?